Segmenting and tagging with vidyut.cheda
Warning
This module is incomplete and may be deleted in a future release. We recommend using the Dharmamitra analyzer instead if possible.
vidyut.cheda segments Sanskrit expressions into words then annotates those words with their morphological data. Our segmenter is optimized for real-time and interactive usage: it is fast, low-memory, and capably handles pathological input.
The main class here is Chedaka, which defines a
segmenter. The main return type is Token, which contains
the segmented text with its associated Pada data.
Example usage:
from vidyut.cheda import Chedaka
chedaka = Chedaka("/path/to/vidyut-data")
for token in chedaka.run('gacCati'):
print(token.text, token.data)