.. py:currentmodule:: vidyut.cheda

Segmenting and tagging with `vidyut.cheda`
==========================================

.. warning::
   This module is incomplete and may be deleted in a future release. We recommend
   using the `Dharmamitra`_ analyzer instead if possible.

.. _Dharmamitra: https://github.com/sebastian-nehrdich/byt5-sanskrit-analyzers

`vidyut.cheda` segments Sanskrit expressions into words then annotates those
words with their morphological data. Our segmenter is optimized for real-time
and interactive usage: it is fast, low-memory, and capably handles pathological
input.

The main class here is :class:`~vidyut.cheda.Chedaka`, which defines a
segmenter. The main return type is :class:`~vidyut.cheda.Token`, which contains
the segmented text with its associated :class:`~vidyut.kosha.Pada` data.

Example usage::

    from vidyut.cheda import Chedaka

    chedaka = Chedaka("/path/to/vidyut-data")

    for token in chedaka.run('gacCati'):
        print(token.text, token.data)