Development of the "Linguist's Calculator" program

Author(-s)

Andrei V. Borovsky, Baikal State University (Irkutsk, Russia)
Fedot E. Mosorkin, Baikal State University (Irkutsk, Russia)

Abstract

This article describes the development of software for research in historical and mathematical linguistics that utilizes a multimetric approach and the analytic hierarchy process. The program is implemented as a desktop application in the Python programming language. The PyQt5 library was used to develop the graphical interface. Relevant mathematical methods for research in historical and mathematical linguistics are considered and implemented, including: transformations of A.D. Dolgopolsky's words into consonant classes, various word similarity metrics (taking into account the number of identical letters in two words (Ratcliff-Obershelp or RO), the number of letters in the longest common substring (LCS), and the number of elementary operations for combining words (Levenshtein distance or L)). The novelty of this work lies in the application of a multimetric approach to the analysis of the list of correspondences and the construction of rankings based on the analytic hierarchy process. The "Linguist's Calculator" allows one to identify hidden lexical relationships between toponyms and lists of corresponding words, as well as conduct research into the origins of toponyms. The program has been tested on toponyms of the Irkutsk region with lost meanings and identifies the most likely matches among candidate words from various languages, including Evenki, Buryat, and Old Russian. The program supports input of a toponym and candidate words, selection of a word transformation model, and output and export of sorted results in descending order of metric sum to an Excel file. Verification procedures were conducted for the hierarchy analysis method in word multimetrics, using word sets that were specifically modified to test the method's robustness to word distortions. The study demonstrated that the algorithm is robust to distortions. With 50% noise, the quality of matching gradually declines. The algorithm's robustness to distortions makes it suitable for working with real (including distorted) toponyms. In the future, we plan to add functionality for quantitatively assessing borrowings in languages to expand its application in historical and mathematical linguistics for analyzing linguistic interactions and reconstructing the etymology of toponymy in the Irkutsk region.

Keywords

Historical and mathematical linguistics, software development, consonant classes, Ratcliffe- Obershelp pairwise metrics, LCS, Levenshtein distance

published

2026-03-05

Files

199-206.pdf (707.3 KB)

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Back