Storing words with vidyut.kosha

vidyut.kosha defines a key-value store that can compactly map hundreds of millions of Sanskrit words to their inflectional data. Depending on the application, storage costs can be as low as 1 byte per word. This storage efficiency comes at the cost of increased lookup time, but in practice, we have found that this increase is negligible and well worth the efficiency gains elsewhere.

vidyut.kosha is tightly integrated with vidyut.prakriya and makes it easy to look up a word then derive it using the Ashtadhyayi.

Note

All inputs to vidyut.kosha should use the SLP1 encoding scheme, and output is likewise encoded in SLP1. You can convert to and from SLP1 by using vidyut.lipi or your favorite transliterator.

Note

If you are using our official data release, note that all word-final visargas are stored as s and r as appropriate. If you wish to look up रामः, search for "rAmas" instead.

Quickstart

The main class here is Kosha, which defines an interface to the underlying dictionary data. The main return type is PadaEntry, which defines rich morphological data about the given word.

Example usage:

from vidyut.kosha import Kosha

kosha = Kosha("/path/to/vidyut-data/kosha")

for entry in kosha.get("gacCati"):
    print(entry)

# `Kosha` also provides fast existence checks.
assert "gacCati" in kosha

# Simple lookups with `[]` work as well. These will raise `KeyError` if
# the key does not exist.
assert kosha["gacCati"]

Return types

The main return types are PadaEntry, PratipadikaEntry, and DhatuEntry. Together, these types provide detailed information about the items in the Kosha.

Kosha will create all of these types on your behalf. On this page, we will create these types manually so that you can better understand their structure and usage.

PadaEntry

The core return type is PadaEntry, which contains morphological data for a single Sanskrit pada. PadaEntry has two basic varieties. The first variety is PadaEntry.Subanta, which models a subanta (nominal):

from vidyut.kosha import PratipadikaEntry, PadaEntry
from vidyut.prakriya import Pratipadika, Linga, Vibhakti, Vacana

rama = Pratipadika.basic("rAma")
rama_entry = PratipadikaEntry.Basic(pratipadika=rama, lingas=[Linga.Pum])
ramah = PadaEntry.Subanta(
    pratipadika_entry=rama_entry,
    linga=Linga.Pum,
    vibhakti=Vibhakti.Prathama,
    vacana=Vacana.Eka)
assert ramah.lemma == "rAma"

PadaEntry.Subanta also models an avyaya (indeclinable):

from vidyut.kosha import PratipadikaEntry, PadaEntry
from vidyut.prakriya import Pratipadika

ca = Pratipadika.basic("ca", is_avyaya=True)
ca_entry = PratipadikaEntry.Basic(pratipadika=ca, lingas=[])
pada = PadaEntry.Subanta(pratipadika_entry=ca_entry)
assert pada.is_avyaya

The second variety is PadaEntry.Tinanta, which models a tinanta (verb):

from vidyut.kosha import DhatuEntry, PadaEntry
from vidyut.prakriya import Dhatu, Gana, Prayoga, Lakara, Purusha, Vacana

gam = Dhatu.mula("ga\\mx~", Gana.Bhvadi)
gam_entry = DhatuEntry(dhatu=gam, clean_text="gam")
gacchati = PadaEntry.Tinanta(
    dhatu_entry=gam_entry,
    prayoga=Prayoga.Kartari,
    lakara=Lakara.Lat,
    purusha=Purusha.Prathama,
    vacana=Vacana.Eka)
assert gacchati.lemma == "gam"

You can separate these two cases by using a match statement:

from vidyut.kosha import PadaEntry

def check_type(entry: PadaEntry):
    # `match` is supported as of Python 3.10.
    match entry:
        case PadaEntry.Subanta():
            return "subanta"
        case PadaEntry.Tinanta():
            return "tinanta"

assert check_type(ramah) == "subanta"
assert check_type(gacchati) == "tinanta"

PratipadikaEntry

PratipadikaEntry is a helper class within PadaEntry. It models a prātipadika (nominal stem) along with helper information.

PratipadikaEntry has two varieties. The first variety is PratipadikaEntry.Basic, which models a basic prātipadika (nominal stem):

from vidyut.kosha import PratipadikaEntry
from vidyut.prakriya import Linga

rama = PratipadikaEntry.Basic(pratipadika=Pratipadika.basic("rAma"), lingas=[Linga.Pum])

assert rama.lemma == "rAma"
assert rama.lingas == [Linga.Pum]

The second variety is PratipadikaEntry.Krdanta, which models a kṛdanta (verbal derivative):

from vidyut.kosha import DhatuEntry, PratipadikaEntry
from vidyut.prakriya import Dhatu, Gana, Krt

gam = Dhatu.mula("ga\\mx~", Gana.Bhvadi)
gam_entry = DhatuEntry(dhatu=gam, clean_text="gam")
gata = PratipadikaEntry.Krdanta(dhatu_entry=gam_entry, krt=Krt.kta)

assert gata.lemma == "gam"

assert gata.dhatu_entry == gam_entry
assert gata.krt == Krt.kta
assert gata.prayoga is None
assert gata.lakara is None

PratipadikaEntry.Krdanta may also set the prayoga and lakāra, which is useful for some kṛdanta derivations:

gacchat = PratipadikaEntry.Krdanta(
    dhatu_entry=gam_entry,
    krt=Krt.Satf,
    lakara=Lakara.Lat,
    prayoga=Prayoga.Kartari)

assert gacchat.lakara == Lakara.Lat
assert gacchat.prayoga == Prayoga.Kartari

gamisyat = PratipadikaEntry.Krdanta(
    dhatu_entry=gam_entry,
    krt=Krt.Satf,
    lakara=Lakara.Lrt,
    prayoga=Prayoga.Kartari)

assert gamisyat.lakara == Lakara.Lrt
assert gamisyat.prayoga == Prayoga.Kartari

gamyamana = PratipadikaEntry.Krdanta(
    dhatu_entry=gam_entry,
    krt=Krt.SAnac,
    lakara=Lakara.Lat,
    prayoga=Prayoga.Karmani)

assert gamyamana.lakara == Lakara.Lat
assert gamyamana.prayoga == Prayoga.Karmani

DhatuEntry

DhatuEntry is a helper class within PadaEntry. It models a Sanskrit dhātu (verb root) along with useful metadata.

from vidyut.kosha import DhatuEntry
from vidyut.prakriya import Dhatu, Gana, Krt

gam = Dhatu.mula("ga\\mx~", Gana.Bhvadi)
gam_entry = DhatuEntry(dhatu=gam, clean_text="gam")

assert gam_entry.dhatu.aupadeshika == "ga\\mx~"
assert gam_entry.dhatu.gana == Gana.Bhvadi
assert gam_entry.clean_text == "gam"

Creating prakriyas

PadaEntry, PratipadikaEntry, and DhatuEntry can all be passed to vidyut.prakriya.Vyakarana.derive():

from vidyut.prakriya import Vyakarana, Sanadi

dhatu_entry = DhatuEntry(
    dhatu=Dhatu.mula("ga\\mx~", Gana.Bhvadi, prefixes=["anu"], sanadi=[Sanadi.Ric]),
    clean_text="gam")

pratipadika_entry = PratipadikaEntry.Krdanta(
    dhatu_entry=dhatu_entry,
    krt=Krt.Satf,
    lakara=Lakara.Lat,
    prayoga=Prayoga.Kartari)

pada_entry = PadaEntry.Subanta(
    pratipadika_entry=pratipadika_entry,
    linga=Linga.Pum,
    vibhakti=Vibhakti.Dvitiya,
    vacana=Vacana.Eka)

v = Vyakarana()
# assert [p.text for p in v.derive(dhatu_entry)] == ["anugami"]
assert [p.text for p in v.derive(pratipadika_entry)] == ["anugamayat"]
assert [p.text for p in v.derive(pada_entry)] == ["anugamayantam"]
assert [p.text for p in v.derive(gamisyat)] == ["gamizyat"]
assert [p.text for p in v.derive(gamyamana)] == ["gamyamAna"]

Note

What is the difference between Pada and PadaEntry? Why do we have both types?

Think of the vidyut.prakriya types as input types and the vidyut.kosha types as output types. Where Pada tells us how to create a pada, PadaEntry shows us the results of creating a pada. This is why the vidyut.kosha types contain useful metadata:

  • DhatuEntry contains clean_text, which is the dictionary version of the dhatu with sandhi applied and accent marks removed. It also contains meanings in Sanskrit (artha_sa), English (artha_en), and Hindi (artha_hi) as well as some other metadata.

  • PratipadikaEntry contains lingas, which includes the lingas typcially used with this pratipadika.

We will add more metadata like this in future releases.