<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">steps</journal-id><journal-title-group><journal-title xml:lang="ru">Шаги/Steps</journal-title><trans-title-group xml:lang="en"><trans-title>Shagi / Steps</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">2412-9410</issn><issn pub-type="epub">2782-1765</issn><publisher><publisher-name>The Russian Presidential Academy of National Economy and Public Administration</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.22394/2412-9410-2021-7-1-183-198</article-id><article-id custom-type="elpub" pub-id-type="custom">steps-571</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Identifying Latin authors through maximum-likelihood Dirichlet inference: A contribution to model-based stylometry</article-title><trans-title-group xml:lang="en"><trans-title>Identifying Latin authors through maximum-likelihood Dirichlet inference: A contribution to model-based stylometry</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Николаев</surname><given-names>Д. С.</given-names></name><name name-style="western" xml:lang="en"><surname>Nikolaev</surname><given-names>Dmitry S.</given-names></name></name-alternatives><email xlink:type="simple">dnikolaev@fastmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Шумилин</surname><given-names>М. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Shumilin</surname><given-names>Mikhail V.</given-names></name></name-alternatives><email xlink:type="simple">mvlshumilin@gmail.com</email><xref ref-type="aff" rid="aff-2"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru">Стокгольмский университет</aff><aff xml:lang="en">Stockholm University</aff></aff-alternatives><aff-alternatives id="aff-2"><aff xml:lang="ru">Институт мировой литературы им. А. М. Горького РАН</aff><aff xml:lang="en">A. M. Gorky Institute of World Literature of the Russian Academy of Sciences</aff></aff-alternatives><volume>7</volume><issue>1</issue><fpage>183</fpage><lpage>198</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Николаев Д.С., Шумилин М.В., 1970</copyright-statement><copyright-year>1970</copyright-year><copyright-holder xml:lang="ru">Николаев Д.С., Шумилин М.В.</copyright-holder><copyright-holder xml:lang="en">Nikolaev D., Shumilin M.</copyright-holder><license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://steps.ranepa.ru/jour/article/view/571">https://steps.ranepa.ru/jour/article/view/571</self-uri><abstract><p>В статье предлагается новый алгоритм для определения авторов латинских прозаических текстов, основанный на Дельте Берроуза и распределении Дирихле. Для демонстрации эффективности алгоритма проводится анализ фрагментов текстов 36 авторов классического и средневекового периода. Наш алгоритм показывает результаты, сопоставимые с результатами, полученными за счет применения Random Forest, одного из самых мощных универсальных классификационных алгоритмов. Преимущество нашего алгоритма заключается в том, что он требует очень мало времени и вычислительных ресурсов для обучения, его легко имплементировать на любом языке программирования общего назначения и его тривиально параллелизовать. Кроме того, поскольку алгоритм основан на эксплицитной модели порождения текста, параметры натренированной модели поддаются интерпретации: точность распределения (сумма его параметров) прямо соответствует стилистической гомогенности текстов соответствующего автора.</p></abstract><trans-abstract xml:lang="en"><p>The last two decades saw a dramatic increase in the number of papers published on the subject of stylometry, which is often narrowly understood as the task of identification of the author of a particular text fragment based on its stylistic properties. We present a new lightweight algorithm for stylometric identification of authors of Latin prose texts based on Burrows’s Delta, computed over relative frequencies of 244 manually selected genre and topic neutral words, and the Dirichlet distribution, whose parameters we estimate using an iterative maximum-likelihood algorithm. In order to demonstrate the effectiveness of the method, we present a case study of 3000-word fragments of texts by 36 classical and medieval authors and show that our method performs on par with Random Forest, a powerful general-purpose classification algorithm. We provide summary statistics of our algorithm’s performance together with confusion matrices demonstrating pairwise discriminability of texts by different authors. The advantages of our method are that it is very simple to implement, very quick to train and do inference with, and that it is very interpretable since it is a model-based algorithm: precision of the fitted Dirichlet distributions directly corresponds to the stylistic homogeneity of the texts by different authors. This makes it possible to use the algorithm as a general research tool in Latin stylistics.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>стилометрия</kwd><kwd>латинская литература</kwd><kwd>распределение Дирихле</kwd><kwd>Дельта Берроуза</kwd><kwd>random forest</kwd><kwd>атрибуция текстов</kwd><kwd>стилистический анализ</kwd><kwd>машинное обучение</kwd></kwd-group><kwd-group xml:lang="en"><kwd>stylometry</kwd><kwd>LATIN literature</kwd><kwd>Dirichlet distribution</kwd><kwd>Burrows's Delta</kwd><kwd>text attribution</kwd><kwd>stylistic analysis</kwd><kwd>machine learning</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
