Computational Kṛdanta Generation: A Novel Approach

[vc_row][vc_column][rev_slider_vc alias=”about-3″][vc_column_text el_class=”articles”]

Computational Kṛdanta Generation: A Novel Approach

Sarada Susarla

KarnatakaSanskritUniversity

       Bengaluru

    sarada.susarla@gmail.com

Abstract

It is widely believed that Maharshi Pāṇini’sAṣṭādhyāyī is the most accurate grammar and word-generation scheme for a natural language there is. The ability to interpret Aṣṭādhyāyīsūtra corpus mechanically has numerous benefits for pedagogy, Saṃskrit word synthesis as well as analysis. We believe that Aṣṭādhyāyī itself is a complete and unified solution for word generation in Saṃskrit.

 

In this paper, we present an automated method to decode Aṣṭādhyāyīsūtras and generatekṛdantawords automatically. We start with a curated representation of Aṣṭādhyāyī in CSV format and convert it into machine interpretable JSON format. We have developed a novel vibhakti-based mechanism to transform sūtras into machine-interpretable expressions using Aṣṭādhyāyī’s inherent embedded conventions. We use this to extract all term definitions (saṃjñās) and pratyayas automatically. We then interpret the 7 ‘it’Saṃjñā rules of Aṣṭādhyāyī to decode the rest of the sūtras employing ‘it’-encoded words. We then go through a subset of kṛdanta related sutras to automatically select the applicability based on the conditions and implement to see the change in form of the words. This process is repeated until no sutra is applicable any more for the conditions between padas. This gives the final form of the word i.e.kṛdantaprātipadikam. We have validated this against few dhātu

and kṛtpratyaya combinations against the golden output word from Saṃskrit texts. We demonstrate this process via a python library we have developed for this purpose.

 

  1. Introduction

We claim that an Algorithmic interpretation ofAṣṭādhyāyī’ssūtrasspecifically forkṛdantaword generation is both feasible and useful. This paper focuses on the ability to algorithmically interpret a given sūtra text and dhātupāṭha text for transforming a given sequence of prakṛti and pratyaya morphemes intokṛdantaword formations. We consider saṃjñā, vidhisūtras for such an interpretation as they comprise the bulk of Aṣṭādhyāyīsūtras and hence are tedious to interpret manually. The novelty of our approach lies in defining a dozen operators to algorithmically compile thousands of Aṣṭādhyāyī rules into Condition-Action predicates and interpret them. We require some manual tagging of some special cases to resolve ambiguities in word interpretation. We validated this for handful of valid dhātu andpratyayausing the applicablesūtras from Aṣṭādhyāyī. The same can be extended for all the prakṛtipratyayacombinations with all the sūtrasof Aṣṭādhyāyī.

 

To demonstrate the feasibility of Aṣṭādhyāyī interpretation, we have implemented a Python-based Just-in-time compiler and interpreter library for Aṣṭādhyāyī that takes as input, the following: (i) an annotated Aṣṭādhyāyīsūtra corpus with morphologically tagged sūtra words, (ii) an input sequence of Saṃskṛt lexemes to be conjugated, and (iii) a series of IDs of sūtrasautomatically selected based on the conditions to be applied to the sequence, and produces a conjugate form(s). We demonstrate the entire flow forkṛdantaform generation in step by step process. Pronounced like `payas‘ meaning milk.to a

 

  1. Related Work

Aṣṭādhyāyī and its interpretation for Saṃskṛt grammatical analysis and word synthesis has been studied extensively [Goyal et al. (2008), ScharfandHyman(2009);Goyaletal.(2012);Satuluriand Kulkarni (2015), Patel and Katuri(2016), Krishna and Goyal (2016), Subbanna andVarakhedi(2010)]. Here, we assume the reader is familiar with Pāṇini’sAṣṭādhyāyīand its various concepts relevant to computational modeling. For a good overview of those concepts, the reader is referred to earlier publications [Goyal et al. (2008), Peterson and Hellwig (2016)]. Several earlier efforts attempted to highlight and emulate various techniques used inAṣṭādhyāyīfor specific grammatical purposes. They typically select a particular subset ofAṣṭādhyāyī’sengine and code its semantics manually to reproduce a specific prakriyā. [Mishra (2008)]manually enumerated the terms and their definitions includingpratyāhāras.

TheAṣṭādhyāyī 2.0 project by Peterson and Hellwig (2016) has developed a richly annotated electronic representation of Aṣṭādhyāyī that makes it amenable to research and machine-processing. We have achieved a similar objective via manual splitting of sandhis and word-separation within compounds(samāsās), and by developing a custom vibhaktianalyzer for detecting word recurrence across vibhakti and vacana variations. Petersen and Soubuste (2013) have developed a digital edition of the Aṣṭādhyāyī. They have created a relational database schema and web-interface to support custom views and sophisticated queries. We opted for a hierarchical key-value structure (JSON) to represent Aṣṭādhyāyī as it enables a more convenient way to navigate the text unlike a relational model.

  1. Methodology

3.1 AṣṭādhyāyīPreparation for Processing

We have started with a well-annotated and curated online resource for Aṣṭādhyāyī[2012] available as a spreadsheet due to its amenability to augmentation and scripted manipulation.

At a high-level, our approach to Aṣṭādhyāyī interpretation involves the following manual steps:

  • Splitting of sandhis and samāsa in the sūtra text to facilitate detection of word recurrences.
  • Enumerating the anuvṛtta padas of each sūtra (from earlier sūtras).
  • Preparation of a vibhakti suffix table that covers subantas of Aṣṭādhyāyī for use in morphological analysis of sūtra
  • Coding of custom functions to interpret the meaning of some technical words used in Aṣṭādhyāyī but not defined therein (e.g., adarśanam, ādiḥ, antyam, etc.).

Then we automatically preprocess the Aṣṭādhyāyī database as follows.

  1. We first perform morphological analysis of each word of every sūtra to extract its prātipadikam. This is required to identify recurrence of a word in the Aṣṭādhyāyī regardless of vibhakti and vacana We describe this step in Section subsec:morph-analysis.
  2. For each sūtra, we generate a canonical sūtratext that we refer to as its `mahāvākya‘ as follows. We expand the sūtra’s text to include all anuvṛtta-padas inherited from earlier sūtras. We represent a mahāvākya as a list of pada descriptions, each with its morphological analysis output.
  3. We auto-extract the definitions of all terms (saṃjñās) used in the Aṣṭādhyāyī. These come in different forms and need to be handled differently.
  4. We compile saṃjñā and vidhisūtras into rules to be interpreted at prakriyā

3.2AṣṭādhyāyīsūtraInterpretation

To prepare the Aṣṭādhyāyī engine for rule interpretation, we automatically preprocess the Aṣṭādhyāyī database as follows.

  1. We first perform morphological analysis of each word of every sūtra to extract its prātipadikam. This is required to identify recurrence of a word in the Aṣṭādhyāyī regardless of vibhakti and vacana
  2. For each sūtra, we generate a canonical sūtratext that we refer to as its `mahāvākya‘ as follows. We expand the sūtra’s text to include all anuvṛtta-padas inherited from earlier sūtras. We represent a mahāvākya as a list of pada descriptions, each with its morphological analysis output.
  3. We auto-extract the definitions of all terms (saṃjñās) used in the Aṣṭādhyāyī. These come in different forms and need to be handled differently.
  4. We compile saṃjñā and vidhisūtras into rules to be interpreted at prakriyā
  5. We determine the vidhisūtras where each of the paribhāṣāsūtras apply, by checking their preconditions. Then we modify the vidhisūtras.
  6. Finally, we create an optimized condition hierarchy for rule-checking by factoring the preconditions for all the Aṣṭādhyāyīsūtras into a decision tree. This is explained in detail in the following section.

3.2.1 Interpreting Vidhi sūtras

A vidhisūtra typically describes a transformation involving a context described by three descriptors

  • Terms in ṣaṣṭhīvibhakti denoting the pada or varṇa position (called sthāna) at which the transformation takes place
  • Terms in saptamīvibhakti denoting the pada or varṇa (called the para) before which the transformation takes place
  • Terms in pañcamīvibhakti denoting the pada or varṇa (called the pūrva) after which the transformation takes place.
  • Terms in tṛtīyāvibhakti denoting the pada in whose immediate neighborhood (before or after) the transformation takes place.
  1. Kridanta Generation and Evaluation

4.1Sample Data for Generation

Firstly we picked a subset of kṛtpratyayas in combination with variety of dhātus and generated the kṛdanta words based on Aṣṭādhyāyī rules manually. This result is used as golden to compare against what we derive through our interpretor. For example we will walk through one of those in the following table:

धातुः कृत्प्रत्ययः modified form forरूपसिद्धिः सूत्रम्Sutra Sutra ID
गम्लृँ शतृ गम्लृँ
गम् उपदेशेऽजनुनासिकइत्। 1.3.2
गम् + शतृ लटःशतृशानचावप्रथमासमानाधिकरणे। 3.2.124
गम् + शत् उपदेशेऽजनुनासिकइत्। 1.3.2
गम् + अत् लशक्वतद्धिते। 1.3.8
गम् + शप् + अत् कर्तरिशप्‌। 3.1.68
गम् + अप् + अत् लशक्वतद्धिते। 1.3.8
गम् + अ + अत् हलन्त्यम्। 1.3.3
गछ् + अ + अत् इषुगमियमांछः। 7.3.77
ग + तुक् + छ् + अ+ अत् छेच। 6.1.73
ग + तु + छ् + अ + अत् हलन्त्यम्। 1.3.3
ग + त् + छ् + अ + अत् उपदेशेऽजनुनासिकइत्। 1.3.2
ग + त् + छ् + अत् अतोगुणे। 6.1.97
ग + च्  + छ् + अत् स्तोःश्चुनाश्चुः। 8.4.40
गच्छत्

 

 

4.2Evaluation of the Generated Sample Data

To demonstrate how our Aṣṭādhyāyī interpreter handles the samjna and vidhisūtraslisted in input algorithm to generate kṛdanta forms, please see the tool’s output shown in output algorithm for generation of gacchatprātipadikam from gamlṛdhātu and śatṛpratyaya.

List of vidhi sutras applied to generate the kṛdanta form, gacchat from its root gamlṛ(dhātu) and śatṛr(prataya) as input to our interpreter is shown below.

‘padas’: ‘ga\\mx~ Satf~’,

‘sutras’: [

13009, # tasya (itaH) lopaH

34067, # kartarikrt

31068, # kartarishap

73077, # ishugamiyamaaMchhaH

61073, # chhe cha

61097, # atoguNe

84040, # stoHschunAschuh

]

 

Below is the output for the example run of the Aṣṭādhyāyī interpreter from itsgamlṛ(dhātu) and śatṛr(prataya) to generate gacchatprātipadikam. The output shows the numerous ‘it’ identifications and removals as well as the āgama and ādeśa modifications both with pratyayas and individual varṇas.

0: Analyzing [‘ga\\mx~’, ‘Satf~’] …

0: Applying sutra 13009 ..

0: Pada Sequence: [‘ga\\mx~’, ‘Satf~’]: Applying Vidhi Sutra 13009 tasyalopaH |

0: found it: varnas=[‘x~’, ‘S’, ‘f~’]

0: Apply 13009 returned Pseq: [‘ga\\mx~’, ‘Satf~’] – [‘g a\\ m’, ‘a t’]

0: Pada Sequence: [‘ga\\mx~’, ‘Satf~’]: Applying Vidhi Sutra 34067 kartarikft |

0: label_samjna(kft, Pseq: [‘ga\\mx~’, ‘Satf~’] – [‘g a\\ m’, ‘a t’]): Checking against sutra 31093 kfdatiN | ..

0: found kft: padas=[‘Satf~’]

0: Applying sutra 31068 ..

0: Pada Sequence: [‘ga\\mx~’, ‘Satf~’]: Applying Vidhi Sutra 31068 kartari Sap |

0: label_samjna(sArvaDAtuka, Pseq: [‘ga\\mx~’, ‘Satf~’] – [‘g a\\ m’, ‘a t’]): Checking against sutra 34113 tiNSitsArvaDAtukam | ..

0: found it: varnas=[‘S’, ‘p’]

0: _transform_one(lopa): {‘part’: ‘varnas’, ‘replace’: 1, ‘purva’: {‘fnames’: [‘it’], ‘pos’: {‘part’: ‘varnas’, ‘inds’: [0, 2]}, ‘result’: {‘varnas’: [0, 2]}}, ‘inds’: [0, 2]}

0: _lopa: lopa_positions = [0, 2]

0: Apply 13009 returned Pseq: [‘Sap’] – [‘a’]

0: Apply 31068 returned Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ m’, ‘a’, ‘a t’]

0: Pada Sequence: [‘ga\\mx~’, ‘Sap’, ‘Satf~’]: Applying Vidhi Sutra 73077 izugamiyamAMCaH |

0: Apply 73077 returned Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ C’, ‘a’, ‘a t’]

0: Applying sutra 61073 ..

0: Pada Sequence: [‘ga\\mx~’, ‘Sap’, ‘Satf~’]: Applying Vidhi Sutra 61073 Ce ca |

0: found it: varnas=[‘u~’, ‘k’]

0: Pada Sequence: [‘tu~k’]: Applying Vidhi Sutra 13009 tasyalopaH |

0: found it: varnas=[‘u~’, ‘k’]

0: _transform_one(lopa): {‘part’: ‘varnas’, ‘replace’: 1, ‘purva’: {‘fnames’: [‘it’], ‘pos’: {‘part’: ‘varnas’, ‘inds’: [1, 2]}, ‘result’: {‘varnas’: [1, 2]}}, ‘inds’: [1, 2]}

0: _lopa: lopa_positions = [1, 2]

0: Apply 13009 returned Pseq: [‘tu~k’] – [‘t’]

0: Apply 61073 returned Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ t C’, ‘a’, ‘a t’]

0: Applying sutra 61097 ..

0: Pada Sequence: [‘ga\\mx~’, ‘Sap’, ‘Satf~’]: Applying Vidhi Sutra 61097 atoguRe |

0: label_samjna(guRa, Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ t C’, ‘a’, ‘a t’]): Checking against sutra 11002 adeNguRaH | ..

0: found at: varnas=[‘a\\’, ‘a’, ‘a’]

0: Apply 61097 returned Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ t C’, ”, ‘a t’]

0: Applying sutra 84040 ..

0: Pada Sequence: [‘ga\\mx~’, ‘Sap’, ‘Satf~’]: Applying Vidhi Sutra 84040 stoHScunAScuH |

0: label_samjna: looking for Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ t C’, ”, ‘a t’]

0: found tu: varnas=[‘t’, ‘t’]

0: label_samjna: looking for Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ t C’, ”, ‘a t’]

0: found cu: varnas=[‘C’]

0: Apply 84040 returned Pseq: [‘ga\\mx~’, ‘Sap’, ‘Satf~’] – [‘g a\\ c C’, ”, ‘a t’]

0: Finally, [‘ga\\mx~’, ‘Satf~’] = ga\cCat

The final output here matches the one we have manually generated reviewed by an expert in the field provided in the table above.

  1. Conclusion

We have developed a programmatic interface to Aṣṭādhyāyīto directly interpret its sūtras for Saṃskṛt word generation and transformation in all its variations. We demonstrated the workflow for generation of kṛdantarūpas and verified for a few combinations ofkṛtpratayasand dhātus to validate the hypothesis.

References

Pawan Goyal, Amba Kulkarni, and Laxmidhar Behera. 2008. Computer simulation of ashtadhyayi: Some

insights. In 2nd International Symposium on Sanskrit Computational Linguistics.

PawanGoyal,GérardHuet,AmbaKulkarni,PeterScharf,andRalphBunker.2012.Adistributedplat- form for Sanskrit processing. In 24th International Conference on Computational Linguistics (COL- ING),Mumbai.

Amrit Krishna and Pawan Goyal. 2016. Towards automating the generation of derivative nouns in Sanskrit by simulating panini. In Sanskrit and Computational Linguistics – 16th World Sanskrit Conference, Bangkok, Thailand, 2015.

AnandMishra.2008.Simulatingthepaniniansystemofsanskritgrammar.In1stand2ndInternational Symposium on Sanskrit ComputationalLinguistics.

Dhaval Patel and ShivakumariKaturi. 2016. Prakriyāpradarśinī – an open source subanta generator. In Sanskrit and Computational Linguistics – 16th World Sanskrit Conference, Bangkok, Thailand,2015.

WiebkePetersenandOliverHellwig.2016.Annotatingandanalyzingtheashtadhyayi.InInputaWord, Analyse the World: Selected Approaches to Corpus Linguistics, Newcastle upon Tyne: Cambridge ScholarsPublishing.

Wiebke Petersen and Simone Soubusta. 2013.  Structure and implementation of a digital edition of  theashtadhyayi. In In Recent Researches in Sanskrit Computational Linguistics – FifthInternational Symposium IIT Mumbai, India, January 2013Proceedings.

PavankumarSatuluri and Amba Kulkarni. 2013. Generation of sanskrit compounds. In International Conference on Natural Language Processing, 19th-20th Dec.

Peter Scharf and Malcolm Hyman. 2009. Linguistic Issues in Encoding Sanskrit. Motilal Banarsidass, Delhi.

Sridhar Subbanna and Srinivasa Varakhedi. 2010. Asiddhatva principle in computational model of ashtadhyayi. In 4th International Sanskrit and Computational Linguistics Symposium.

Sarada Susarla and Sai Susarla. 2012. Panini ashtadhyayi sutras with commentaries: Sortable index. https://sanskritdocuments.org/learning_tools/ashtadhyayi/.

 [/vc_column_text][/vc_column][/vc_row]