Storing words with vidyut.kosha
vidyut.kosha defines a key-value store that can compactly map hundreds of millions of Sanskrit words to their inflectional data. Depending on the application, storage costs can be as low as 1 byte per word. This storage efficiency comes at the cost of increased lookup time, but in practice, we have found that this increase is negligible and well worth the efficiency gains elsewhere.
vidyut.kosha is tightly integrated with vidyut.prakriya and makes it easy to look up a word then derive it using the Ashtadhyayi.
Note
All inputs to vidyut.kosha should use the SLP1 encoding scheme, and output is likewise encoded in SLP1. You can convert to and from SLP1 by using vidyut.lipi or your favorite transliterator.
Note
If you are using our official data release, note that all word-final visargas
are stored as s and r as appropriate. If you wish to look up रामः, search
for "rAmas" instead.
Quickstart
The main class here is Kosha, which defines an interface to the underlying
dictionary data. The main return type is PadaEntry, which
defines rich morphological data about the given word.
Example usage:
from vidyut.kosha import Kosha
kosha = Kosha("/path/to/vidyut-data/kosha")
for entry in kosha.get("gacCati"):
print(entry)
# `Kosha` also provides fast existence checks.
assert "gacCati" in kosha
# Simple lookups with `[]` work as well. These will raise `KeyError` if
# the key does not exist.
assert kosha["gacCati"]
Return types
The main return types are PadaEntry, PratipadikaEntry, and
DhatuEntry. Together, these types provide detailed information about
the items in the Kosha.
Kosha will create all of these types on your behalf. On this page,
we will create these types manually so that you can better understand their
structure and usage.
PadaEntry
The core return type is PadaEntry, which contains morphological data
for a single Sanskrit pada. PadaEntry has two basic varieties. The
first variety is PadaEntry.Subanta, which models a subanta (nominal):
from vidyut.kosha import PratipadikaEntry, PadaEntry
from vidyut.prakriya import Pratipadika, Linga, Vibhakti, Vacana
rama = Pratipadika.basic("rAma")
rama_entry = PratipadikaEntry.Basic(pratipadika=rama, lingas=[Linga.Pum])
ramah = PadaEntry.Subanta(
pratipadika_entry=rama_entry,
linga=Linga.Pum,
vibhakti=Vibhakti.Prathama,
vacana=Vacana.Eka)
assert ramah.lemma == "rAma"
PadaEntry.Subanta also models an avyaya (indeclinable):
from vidyut.kosha import PratipadikaEntry, PadaEntry
from vidyut.prakriya import Pratipadika
ca = Pratipadika.basic("ca", is_avyaya=True)
ca_entry = PratipadikaEntry.Basic(pratipadika=ca, lingas=[])
pada = PadaEntry.Subanta(pratipadika_entry=ca_entry)
assert pada.is_avyaya
The second variety is PadaEntry.Tinanta, which models a tinanta (verb):
from vidyut.kosha import DhatuEntry, PadaEntry
from vidyut.prakriya import Dhatu, Gana, Prayoga, Lakara, Purusha, Vacana
gam = Dhatu.mula("ga\\mx~", Gana.Bhvadi)
gam_entry = DhatuEntry(dhatu=gam, clean_text="gam")
gacchati = PadaEntry.Tinanta(
dhatu_entry=gam_entry,
prayoga=Prayoga.Kartari,
lakara=Lakara.Lat,
purusha=Purusha.Prathama,
vacana=Vacana.Eka)
assert gacchati.lemma == "gam"
You can separate these two cases by using a match statement:
from vidyut.kosha import PadaEntry
def check_type(entry: PadaEntry):
# `match` is supported as of Python 3.10.
match entry:
case PadaEntry.Subanta():
return "subanta"
case PadaEntry.Tinanta():
return "tinanta"
assert check_type(ramah) == "subanta"
assert check_type(gacchati) == "tinanta"
PratipadikaEntry
PratipadikaEntry is a helper class within PadaEntry. It models
a prātipadika (nominal stem) along with helper information.
PratipadikaEntry has two varieties. The first variety is
PratipadikaEntry.Basic, which models a basic prātipadika (nominal stem):
from vidyut.kosha import PratipadikaEntry
from vidyut.prakriya import Linga
rama = PratipadikaEntry.Basic(pratipadika=Pratipadika.basic("rAma"), lingas=[Linga.Pum])
assert rama.lemma == "rAma"
assert rama.lingas == [Linga.Pum]
The second variety is PratipadikaEntry.Krdanta, which models a kṛdanta (verbal derivative):
from vidyut.kosha import DhatuEntry, PratipadikaEntry
from vidyut.prakriya import Dhatu, Gana, Krt
gam = Dhatu.mula("ga\\mx~", Gana.Bhvadi)
gam_entry = DhatuEntry(dhatu=gam, clean_text="gam")
gata = PratipadikaEntry.Krdanta(dhatu_entry=gam_entry, krt=Krt.kta)
assert gata.lemma == "gam"
assert gata.dhatu_entry == gam_entry
assert gata.krt == Krt.kta
assert gata.prayoga is None
assert gata.lakara is None
PratipadikaEntry.Krdanta may also set the prayoga and lakāra, which is
useful for some kṛdanta derivations:
gacchat = PratipadikaEntry.Krdanta(
dhatu_entry=gam_entry,
krt=Krt.Satf,
lakara=Lakara.Lat,
prayoga=Prayoga.Kartari)
assert gacchat.lakara == Lakara.Lat
assert gacchat.prayoga == Prayoga.Kartari
gamisyat = PratipadikaEntry.Krdanta(
dhatu_entry=gam_entry,
krt=Krt.Satf,
lakara=Lakara.Lrt,
prayoga=Prayoga.Kartari)
assert gamisyat.lakara == Lakara.Lrt
assert gamisyat.prayoga == Prayoga.Kartari
gamyamana = PratipadikaEntry.Krdanta(
dhatu_entry=gam_entry,
krt=Krt.SAnac,
lakara=Lakara.Lat,
prayoga=Prayoga.Karmani)
assert gamyamana.lakara == Lakara.Lat
assert gamyamana.prayoga == Prayoga.Karmani
DhatuEntry
DhatuEntry is a helper class within PadaEntry. It models a
Sanskrit dhātu (verb root) along with useful metadata.
from vidyut.kosha import DhatuEntry
from vidyut.prakriya import Dhatu, Gana, Krt
gam = Dhatu.mula("ga\\mx~", Gana.Bhvadi)
gam_entry = DhatuEntry(dhatu=gam, clean_text="gam")
assert gam_entry.dhatu.aupadeshika == "ga\\mx~"
assert gam_entry.dhatu.gana == Gana.Bhvadi
assert gam_entry.clean_text == "gam"
Creating prakriyas
PadaEntry, PratipadikaEntry, and DhatuEntry can all be
passed to vidyut.prakriya.Vyakarana.derive():
from vidyut.prakriya import Vyakarana, Sanadi
dhatu_entry = DhatuEntry(
dhatu=Dhatu.mula("ga\\mx~", Gana.Bhvadi, prefixes=["anu"], sanadi=[Sanadi.Ric]),
clean_text="gam")
pratipadika_entry = PratipadikaEntry.Krdanta(
dhatu_entry=dhatu_entry,
krt=Krt.Satf,
lakara=Lakara.Lat,
prayoga=Prayoga.Kartari)
pada_entry = PadaEntry.Subanta(
pratipadika_entry=pratipadika_entry,
linga=Linga.Pum,
vibhakti=Vibhakti.Dvitiya,
vacana=Vacana.Eka)
v = Vyakarana()
# assert [p.text for p in v.derive(dhatu_entry)] == ["anugami"]
assert [p.text for p in v.derive(pratipadika_entry)] == ["anugamayat"]
assert [p.text for p in v.derive(pada_entry)] == ["anugamayantam"]
assert [p.text for p in v.derive(gamisyat)] == ["gamizyat"]
assert [p.text for p in v.derive(gamyamana)] == ["gamyamAna"]
Note
What is the difference between Pada and PadaEntry?
Why do we have both types?
Think of the vidyut.prakriya types as input types and the vidyut.kosha types as
output types. Where Pada tells us how to create a pada,
PadaEntry shows us the results of creating a pada. This is why the
vidyut.kosha types contain useful metadata:
DhatuEntrycontains clean_text, which is the dictionary version of the dhatu with sandhi applied and accent marks removed. It also contains meanings in Sanskrit (artha_sa), English (artha_en), and Hindi (artha_hi) as well as some other metadata.PratipadikaEntrycontains lingas, which includes the lingas typcially used with this pratipadika.
We will add more metadata like this in future releases.