Transliterating with vidyut.lipi
vidyut-lipi is a transliteration library for Sanskrit and Pali that also supports many of the scripts used within the Indosphere. Our main goals with this module are correctness, completeness, and consistency.
vidyut-lipi is generally comparable to other Python-based transliterators like Aksharamukha and indic_transliteration but has specific advantages for certain use cases. For details on how vidyut.lipi differs from these transliterators, see the Features section below.
Quickstart
Transliteration is simple and straightforward:
from vidyut.lipi import transliterate, Scheme
text = "tapaHsvAdhyAyanirataM tapasvI vAgvidAM varam"
output = transliterate(text, Scheme.HarvardKyoto, Scheme.Devanagari)
print(output)
Output:
तपःस्वाध्यायनिरतं तपस्वी वाग्विदां वरम्
If the input scheme is not known, you can use detect() to guess it:
from vidyut.lipi import detect, transliterate, Scheme
text = "tapaHsvAdhyAyanirataM tapasvI vAgvidAM varam"
# NOTE: `detect` returns `None` if no scheme could be found.
source = detect(text)
assert source == Scheme.HarvardKyoto
output = transliterate(text, source, Scheme.Devanagari)
vidyut.lipi supports a wide variety of transliteration schemes. For example:
text = "tapaHsvAdhyAyanirataM tapasvI vAgvidAM varam"
for dest in Scheme.choices():
output = transliterate(text, Scheme.HarvardKyoto, dest)
print(f"{dest}: {output}")
Output:
Assamese: তপঃস্ৱাধ্যায়নিৰতং তপস্ৱী ৱাগ্ৱিদাং ৱৰম্
Balinese: ᬢᬧᬄᬲ᭄ᬯᬵᬥ᭄ᬬᬵᬬᬦᬶᬭᬢᬂ ᬢᬧᬲ᭄ᬯᬷ ᬯᬵᬕ᭄ᬯᬶᬤᬵᬂ ᬯᬭᬫ᭄
Bengali: তপঃস্বাধ্যায়নিরতং তপস্বী বাগ্বিদাং বরম্
Bhaiksuki: 𑰝𑰢𑰾𑰭𑰿𑰪𑰯𑰠𑰿𑰧𑰯𑰧𑰡𑰰𑰨𑰝𑰽𑱃𑰝𑰢𑰭𑰿𑰪𑰱𑱃𑰪𑰯𑰐𑰿𑰪𑰰𑰟𑰯𑰽𑱃𑰪𑰨𑰦𑰿
Brahmi: 𑀢𑀧𑀂𑀲𑁆𑀯𑀸𑀥𑁆𑀬𑀸𑀬𑀦𑀺𑀭𑀢𑀁 𑀢𑀧𑀲𑁆𑀯𑀻 𑀯𑀸𑀕𑁆𑀯𑀺𑀤𑀸𑀁 𑀯𑀭𑀫𑁆
Burmese: တပးသွာဓျာယနိရတံ တပသွီ ဝာဂွိဒာံ ဝရမ်
Cham: ꨓꨚꩍꨧꨶꨩꨖꨳꨩꨢꨘꨪꨣꨓꩌ ꨓꨚꨧꨶꨫ ꨥꨩꨈꨶꨪꨕꨩꩌ ꨥꨣꩌ
Devanagari: तपःस्वाध्यायनिरतं तपस्वी वाग्विदां वरम्
Dogra: 𑠙𑠞𑠸𑠩𑠹𑠦𑠬𑠜𑠹𑠣𑠬𑠣𑠝𑠭𑠤𑠙𑠷 𑠙𑠞𑠩𑠹𑠦𑠮 𑠦𑠬𑠌𑠹𑠦𑠭𑠛𑠬𑠷 𑠦𑠤𑠢𑠹
Grantha: 𑌤𑌪𑌃𑌸𑍍𑌵𑌾𑌧𑍍𑌯𑌾𑌯𑌨𑌿𑌰𑌤𑌂 𑌤𑌪𑌸𑍍𑌵𑍀 𑌵𑌾𑌗𑍍𑌵𑌿𑌦𑌾𑌂 𑌵𑌰𑌮𑍍
Gujarati: તપઃસ્વાધ્યાયનિરતં તપસ્વી વાગ્વિદાં વરમ્
GunjalaGondi: 𑵳𑶅𑶖𑶉𑶗𑵭𑶊𑵹𑶗𑵬𑶊𑵬𑵺𑶋𑶈𑵳𑶕 𑵳𑶅𑶉𑶗𑵭𑶌 𑵭𑶊𑵶𑶗𑵭𑶋𑵸𑶊𑶕 𑵭𑶈𑵰
Gurmukhi: ਤਪਃਸ੍ਵਾਧ੍ਯਾਯਨਿਰਤਂ ਤਪਸ੍ਵੀ ਵਾਗ੍ਵਿਦਾਂ ਵਰਮ੍
Javanese: ꦠꦥꦃꦱ꧀ꦮꦴꦣꦾꦴꦪꦤꦶꦫꦠꦁ ꦠꦥꦱ꧀ꦮꦷ ꦮꦴꦒ꧀ꦮꦶꦢꦴꦁ ꦮꦫꦩ꧀
Kaithi: 𑂞𑂣𑂂𑂮𑂹𑂫𑂰𑂡𑂹𑂨𑂰𑂨𑂢𑂱𑂩𑂞𑂁 𑂞𑂣𑂮𑂹𑂫𑂲 𑂫𑂰𑂏𑂹𑂫𑂱𑂠𑂰𑂁 𑂫𑂩𑂧𑂹
Kannada: ತಪಃಸ್ವಾಧ್ಯಾಯನಿರತಂ ತಪಸ್ವೀ ವಾಗ್ವಿದಾಂ ವರಮ್
Kharoshthi: 𐨟𐨤𐨏𐨯𐨿𐨬𐨌𐨢𐨿𐨩𐨌𐨩𐨣𐨁𐨪𐨟𐨎 𐨟𐨤𐨯𐨿𐨬𐨁𐨌 𐨬𐨌𐨒𐨿𐨬𐨁𐨡𐨌𐨎 𐨬𐨪𐨨𐨿
Khmer: តបះស្វាធ្យាយនិរតំ តបស្វី វាគ្វិទាំ វរម៑
Khudawadi: 𑋍𑋒𑋞𑋪𑋝𑋪𑋛𑋠𑋐𑋪𑋘𑋠𑋘𑋑𑋡𑋙𑋍𑋟 𑋍𑋒𑋝𑋪𑋛𑋢 𑋛𑋠𑊼𑋪𑋛𑋡𑋏𑋠𑋟 𑋛𑋙𑋗𑋪
Limbu: ᤋᤐᤜ᤻ᤛᤫᤠᤎᤩᤠᤕᤏᤡᤖᤋᤱ ᤋᤐᤛᤫᤡ᤺ ᤘᤠᤃᤫᤡᤍᤠᤱ ᤘᤖᤔ᤻
Malayalam: തപഃസ്വാധ്യായനിരതം തപസ്വീ വാഗ്വിദാം വരമ്
MeeteiMayek: ꯇꯄꯍ꯭ꯁ꯭ꯋꯥꯙ꯭ꯌꯥꯌꯅꯤꯔꯇꯪ ꯇꯄꯁ꯭ꯋꯤ ꯋꯥꯒ꯭ꯋꯤꯗꯥꯪ ꯋꯔꯝ
MasaramGondi: 𑴛𑴠𑵁𑴫𑵅𑴨𑴱𑴞𑵅𑴥𑴱𑴥𑴟𑴲𑴦𑴛𑵀 𑴛𑴠𑴫𑵅𑴨𑴳 𑴨𑴱𑴎𑵅𑴨𑴲𑴝𑴱𑵀 𑴨𑴦𑴤𑵅
Modi: 𑘝𑘢𑘾𑘭𑘿𑘪𑘰𑘠𑘿𑘧𑘰𑘧𑘡𑘱𑘨𑘝𑘽 𑘝𑘢𑘭𑘿𑘪𑘲 𑘪𑘰𑘐𑘿𑘪𑘱𑘟𑘰𑘽 𑘪𑘨𑘦𑘿
Mon: တပးသွာဓျာယနိရတံ တပသွဳ ဝာဂွိဒာံ ဝရမ်
Nandinagari: 𑦽𑧂𑧟𑧍𑧠𑧊𑧑𑧀𑧠𑧇𑧑𑧇𑧁𑧒𑧈𑦽𑧞 𑦽𑧂𑧍𑧠𑧊𑧓 𑧊𑧑𑦰𑧠𑧊𑧒𑦿𑧑𑧞 𑧊𑧈𑧆𑧠
Newa: 𑐟𑐥𑑅𑐳𑑂𑐰𑐵𑐢𑑂𑐫𑐵𑐫𑐣𑐶𑐬𑐟𑑄 𑐟𑐥𑐳𑑂𑐰𑐷 𑐰𑐵𑐐𑑂𑐰𑐶𑐡𑐵𑑄 𑐰𑐬𑐩𑑂
Odia: ତପଃସ୍ଵାଧ୍ଯାଯନିରତଂ ତପସ୍ଵୀ ଵାଗ୍ଵିଦାଂ ଵରମ୍
OlChiki: ᱛᱚᱯᱚᱷᱥᱣᱟᱫᱷᱭᱟᱭᱚᱱᱤᱨᱚᱛᱚᱝ ᱛᱚᱯᱚᱥᱣᱤᱻ ᱣᱟᱜᱣᱤᱫᱟᱝ ᱣᱚᱨᱚᱢ
Saurashtra: ꢡꢦꢁꢱ꣄ꢮꢵꢤ꣄ꢫꢵꢫꢥꢶꢬꢡꢀ ꢡꢦꢱ꣄ꢮꢷ ꢮꢵꢔ꣄ꢮꢶꢣꢵꢀ ꢮꢬꢪ꣄
Sharada: 𑆠𑆥𑆂𑆱𑇀𑆮𑆳𑆣𑇀𑆪𑆳𑆪𑆤𑆴𑆫𑆠𑆁 𑆠𑆥𑆱𑇀𑆮𑆵 𑆮𑆳𑆓𑇀𑆮𑆴𑆢𑆳𑆁 𑆮𑆫𑆩𑇀
Siddham: 𑖝𑖢𑖾𑖭𑖿𑖪𑖯𑖠𑖿𑖧𑖯𑖧𑖡𑖰𑖨𑖝𑖽 𑖝𑖢𑖭𑖿𑖪𑖱 𑖪𑖯𑖐𑖿𑖪𑖰𑖟𑖯𑖽 𑖪𑖨𑖦𑖿
Sinhala: තපඃස්වාධ්යායනිරතං තපස්වී වාග්විදාං වරම්
Soyombo: 𑩫𑩰𑪗𑪁𑪘𑩾𑩛𑩮𑪘𑩻𑩛𑩻𑩯𑩑𑩼𑩫𑪖 𑩫𑩰𑪁𑪘𑩾𑩑𑩛 𑩾𑩛𑩞𑪘𑩾𑩑𑩭𑩛𑪖 𑩾𑩼𑩴𑪘
TaiTham: ᨲᨸᩡᩈ᩠ᩅᩣᨵ᩠ᨿᩣᨿᨶᩥᩁᨲᩴ ᨲᨸᩈ᩠ᩅᩦ ᩅᩣᨣ᩠ᩅᩥᨴᩣᩴ ᩅᩁᨾ᩺
Takri: 𑚙𑚞𑚬𑚨𑚶𑚦𑚭𑚜𑚶𑚣𑚭𑚣𑚝𑚮𑚤𑚙𑚫 𑚙𑚞𑚨𑚶𑚦𑚯 𑚦𑚭𑚌𑚶𑚦𑚮𑚛𑚭𑚫 𑚦𑚤𑚢𑚶
Tamil: தப꞉ஸ்வாத்⁴யாயநிரதம்ʼ தபஸ்வீ வாக்³விதா³ம்ʼ வரம்
Telugu: తపఃస్వాధ్యాయనిరతం తపస్వీ వాగ్విదాం వరమ్
Thai: ตปห์สฺวาธฺยายนิรตํ ตปสฺวี วาคฺวิทาํ วรมฺ
Tibetan: ཏཔཿསྭཱདྷྱཱཡནིརཏཾ་ཏཔསྭཱི་བཱགྭིདཱཾ་བརམ
Tirhuta: 𑒞𑒣𑓁𑒮𑓂𑒫𑒰𑒡𑓂𑒨𑒰𑒨𑒢𑒱𑒩𑒞𑓀 𑒞𑒣𑒮𑓂𑒫𑒲 𑒫𑒰𑒑𑓂𑒫𑒱𑒠𑒰𑓀 𑒫𑒩𑒧𑓂
ZanabazarSquare: 𑨙𑨞𑨹𑨰𑩇𑨭𑨊𑨜𑩇𑨪𑨊𑨪𑨝𑨁𑨫𑨙𑨸 𑨙𑨞𑨰𑩇𑨭𑨁𑨊 𑨭𑨊𑨍𑩇𑨭𑨁𑨛𑨊𑨸 𑨭𑨫𑨢𑨴
BarahaSouth: tapaHsvAdhyAyanirataM tapasvI vAgvidAM varam
HarvardKyoto: tapaHsvAdhyAyanirataM tapasvI vAgvidAM varam
Iast: tapaḥsvādhyāyanirataṃ tapasvī vāgvidāṃ varam
Iso15919: tapaḥsvādhyāyanirataṁ tapasvī vāgvidāṁ varam
Itrans: tapaHsvAdhyAyanirataM tapasvI vAgvidAM varam
Slp1: tapaHsvADyAyanirataM tapasvI vAgvidAM varam
Velthuis: tapa.hsvaadhyaayanirata.m tapasvii vaagvidaa.m varam
Wx: wapaHsvAXyAyanirawaM wapasvI vAgvixAM varam
Features
Transliterators are essentially identical for common use cases, but they tend to differ at the edges. Here are some features that distinguish vidyut.lipi for uncommon use cases.
vidyut.lipi has strong support for Grantha, especially Grantha numerals:
text = "1 12 123 1234 12345"
output = transliterate(text, Scheme.HarvardKyoto, Scheme.Grantha)
assert output == "௧ ௰௨ ௱௨௰௩ ௲௨௱௩௰௪ ௰௨௲௩௱௪௰௫"
vidyut.lipi has strong support for ISO 15919, including the
:disambiguating separator:
text = "नरइति"
output = transliterate(text, Scheme.Devanagari, Scheme.Iso15919)
assert output == "nara:iti"
vidyut.lipi is aware of Unicode NFC and NFD forms and normalizes all output to NFC.
vidyut.lipi has strong support for anudātta and svarita accents across a variety of schemes:
text = "a\\ a^"
output = transliterate(text, Scheme.Slp1, Scheme.Devanagari)
assert output == "अ॒ अ॑"
detect()is especially robust and has high coverage.vidyut-lipi is around 6 times faster than indic_transliteration and around 8 times faster than Aksharamukha. This speed-up does not matter for everyday usage, but it can matter for heavy workloads, such as transliterating an entire corpus.