Reference:
Debenova Z.A., TSipilova S.S., Tsyrenova N.D..
Monuments in Mongolian Writing: An Experience of Creating a Parallel Corpus
// Historical informatics.
2025. № 2.
P. 1-10.
DOI: 10.7256/2585-7797.2025.2.73930 EDN: MMDRBC URL: https://en.nbpublish.com/library_read_article.php?id=73930
Abstract:
This article highlights the results of the work on creating a parallel corpus of Buryat sources in Mongolian script. The project is being carried out with the support of the Russian Science Foundation, based on the archival materials from the Center for Eastern Manuscripts and Xylographs of the IMBT SB RAS. The subject of the research is the process of creating a database for the corpus, the specifics of compiling it, particularly the selection of materials. Currently, the developing corpus includes the following documents from the archival funds of the CVRK IMBT SB RAS: texts of historical content—"A Brief Outline of the History of Khori-Mongolian Buryats," "On the History of the Zugalai Region"; an official document "Protocol of the All-Buryat Assembly in Chita in 1917"; an ethnographic composition "Narrative of Samdan Noyon," a medical work "Notes of Tibetan Doctor Donduba Munkuyev"; a work of Buddhist didactic literature "Subhashita" translated by Galsan-Jimba Tuguldur. General scientific and source study methods were applied to the analysis of handwritten, printed, and xylographic texts in Mongolian script. The processes of material selection, their transliteration and translation, as well as substantive (thematic, lexical) and technical aspects (typos, pagination, numerals) were examined. The parallel Russian-language version is being created by the research group. The authors emphasize the significance of creating a parallel corpus as a resource for further research in the field of Buryat linguistics, translation studies, and cultural studies, as well as its role in promoting Old Mongolian script among the general public and preserving the intangible heritage of the Baikal region. The corpus represents a unique database for further research in various fields of science, etc. The texts considered will serve as a basis for the development of machine translation algorithms, and the work being conducted at this stage will help future developers create more effective algorithms. The creation of a specialized database that is open not only to researchers but also to representatives of the educational sector, professional translators, and anyone showing a scientific or cultural interest in written heritage appears promising.
Keywords:
machine translation, intangible heritage, Baikal region, Center of Oriental Manuscripts and Xylographs, Buryatia, written sources, parallel corpus, Mongolian script, digitization, text corpus
Reference:
Latonov V.V., Latonova A.V..
Determining the authorship of the "Notes of the Decembrist I.I. Gorbachevsky" by machine learning methods
// Historical informatics.
2025. № 1.
P. 122-133.
DOI: 10.7256/2585-7797.2025.1.72805 EDN: QALGAU URL: https://en.nbpublish.com/library_read_article.php?id=72805
Abstract:
In the presented work, the object of research is the "Notes of the Decembrist I.I. Gorbachevsky", which are one of the most valuable sources on the history of the Decembrist movement, created by its participants themselves. They highlight the formation and development of such a Decembrist organization as the Society of United Slavs, which later joined the Southern Society of Decembrists. Written in exile in Siberia, these notes represent not only a source of factual material, but also an original concept of the secret society's development, and a retrospective "inside look" at the mistakes made by the conspirators. However, Gorbachevsky's "Notes" are notable for another circumstance. Contrary to their well-established name in literature, we cannot unequivocally assert that their author was I.I. Gorbachevsky himself from among the Decembrists. The fact is that the first publication of the "Notes" – in the journal "Russian Archive" in 1882 – was presented under the heading "Notes of an Unknown Person from the Society of the United Slavs." The subject of the research in the presented work is the question of the authorship of the "Notes", which has no clear answer among historians today. In this paper, we propose a solution to the problem of determining the authorship of the "Notes of the Decembrist I.I. Gorbachevsky" using machine learning methods. I.I. Gorbachevsky himself, as well as the Decembrist P.I. Borisov, are considered as possible authors. The novelty of the research lies in the fact that machine learning methods were used to determine the authorship of the "Notes". The authors trained four types of models to predict the authorship of each of the sentences in the Notes. As a result, most of the proposals of the "Notes" were assessed as written by Gorbachev. The largest percentage of offers, 69.2%, was attributed to Gorbachev by the Count Vectorizer + SVC model. The accuracy of all models exceeded 80% on average, while those based on BERT coding averaged close to 90%. The main conclusion of the work, therefore, can be considered that the "Notes" were more likely to have been written by I.I. Gorbachevsky than by P.I. Borisov. The methods used in the framework of the presented study provide another argument in favor of this version. The code and dataset are available at the link: https://github.com/WLatonov/Gorbachevskiy_notes .
Keywords:
Gorbachevskiy's notes, The Decembrists, BERT, Binary classification, Neural networks, Machine learning, Stylometry, Attribution, authorship definition, Gorbachevskiy's letters
Reference:
Borodkin L..
Historian in the world of neural networks: the second wave of artificial intelligence technology application.
// Historical informatics.
2025. № 1.
P. 83-94.
DOI: 10.7256/2585-7797.2025.1.74100 EDN: QXYMHF URL: https://en.nbpublish.com/library_read_article.php?id=74100
Abstract:
Over the last decade, artificial intelligence (AI) technologies have become one of the most sought-after areas of scientific and technological development. This process has also impacted historical science, where the first research in this area began in the 1980s (the so-called first wave) – both in our country and abroad. Then came the "AI winter," and at the beginning of the 2010s, the "second wave" of AI emerged. The subject of this article is the new opportunities for applying AI in history and the new problems arising in this process today, when the main focus of AI has shifted to artificial neural networks, machine learning (including deep learning), generative neural networks, large language models, etc. Based on the experience of historians applying AI, the article proposes the following seven directions for such research: recognition of handwritten and old printed texts, their transcription; attribution and dating of texts using AI; typological classification and clustering of data from statistical sources (particularly using fuzzy logic); source criticism tasks, data completion and enrichment, and reconstruction using AI; intelligent search for relevant information, utilizing generative neural networks for this purpose; using generative networks for text processing and analysis; and the use of AI in archives, museums, and other institutions that store cultural heritage. An analysis of the discussion of similar issues organized by the leading American historical journal AHR has been conducted. These are conceptual questions regarding the interaction between humans and machines ("historian in the world of artificial neural networks"), the possibilities for historians to use machine learning technologies (particularly deep learning), various AI tools in historical research, as well as the evolution of AI in the 21st century. Practical aspects were also touched upon, such as the experience of recognizing newspaper texts from past centuries using AI. In conclusion, the article addresses the problems related to the use of generative neural networks by historians.
Keywords:
algorythms, text atribution, image recognition, generative neural networks, deep learning, machine learning, artificial neural networks, Artificial Intelligence, data, historical source
Reference:
Mashchenko N.E., Gaidar E.V..
Artificial intelligence technologies in the formation of the archival environment: problems and prospects
// Historical informatics.
2025. № 1.
P. 162-173.
DOI: 10.7256/2585-7797.2025.1.73393 EDN: QEIGBR URL: https://en.nbpublish.com/library_read_article.php?id=73393
Abstract:
The authors studied the prospects of using artificial intelligence (AI) technologies to create and develop a digital archival environment, as well as their impact on the optimization and automation of archived data management processes. The main purpose of the work is to analyze modern digital solutions aimed at improving the processes of storing, searching and processing archival documents (including handwritten, damaged, multilingual). The paper explores key technologies used in digital archives, including intelligent scanning, natural language processing (NLP), computer vision, machine learning, and intelligent search methods. Special attention is paid to the problems of loss of archival materials, the need to restore them, ensure data security and accessibility, which is especially important in an unstable political situation and limited resources for new territories. The research is based on a systematic analysis of modern information technologies and their application in the archival business. The work uses methods of comparative analysis, classification and forecasting, which allows us to identify key areas of AI implementation in the archival field. The novelty of the work lies in an integrated approach to analyzing the use of AI in the archival field, identifying problematic aspects of archive digitalization, and proposing automation of the processes of storing, processing, and searching archival data. It is concluded that artificial intelligence technologies can significantly improve the efficiency of archives, providing accelerated document processing, intelligent classification, data protection and convenient access to information. In addition, the need to develop new algorithms based on machine learning is emphasized, which will improve the recognition of handwritten texts, the processing of corrupted documents and multilingual archival materials. The introduction of such technologies is becoming an important part of the digital transformation strategy of archival affairs and plays a key role in preserving historical heritage.
Keywords:
machine learning, computer vision, natural language processing, data security, intelligent scanning, predictive intelligence, digital transformation, artificial intelligence, digital archival environment, archives
Reference:
Voronkova D.S..
Computerized content analysis of articles from the journal "Bulletin of Finance, Industry, and Trade" for the year 1917: testing the capabilities of the artificial intelligence module in the MAXQDA program.
// Historical informatics.
2025. № 1.
P. 134-161.
DOI: 10.7256/2585-7797.2025.1.73332 EDN: QEHIBU URL: https://en.nbpublish.com/library_read_article.php?id=73332
Abstract:
The subject of the research is the articles of the official printed organ of the Russian Ministry of Finance – the journal "Bulletin of Finance, Industry and Trade" – for the year 1917. Undoubtedly, this year was a turning point in domestic history. In this regard, it is important to use new approaches to uncover the informational potential of this largely unique source, which contains valuable information about the country's economy (including not only those areas highlighted in the journal's title but also, for example, about tax and customs policy, as well as preparations for a number of reforms, including agrarian reforms). Moreover, it is necessary to take into account that during this period the journal was published against the backdrop of the ongoing First World War, and the related issues were also reflected in its pages. Methodologically, the article is based on computerized content analysis. The main focus is on artificial intelligence tools within the specialized software MAXQDA. The novelty of the research lies in the fact that for the first time the capabilities of the AI Assist module and its latest component, MAXQDA Tailwind, which was in the beta version at the time of the article's publication, have been tested. The author received early access to all product features by invitation from the developers and provided feedback based on the work outcomes. The international virtual conference of MAXQDA users (MAXDAYS 2025), where the functionality of MAXQDA Tailwind will be presented, will take place on March 18-19 of this year. Thus, readers will be able to familiarize themselves with it before its official release. The article proves that artificial intelligence in no way replaces the historian but can assist them in deepening and making the analysis of historical sources more comprehensive.
Keywords:
February Revolution, First World War, official press organ, AI Assist, MAXQDA Tailwind, artificial intelligence, MAXQDA, content analysis, Media, Bulletin of Finance
Reference:
Mekhovskii V.A., Kizhner I.A..
The world through the eyes of an educated person in Minusinsk of the late XIX - early XX centuries: distribution of the frequency of geographical names in the books of the Minusinsk Public Library
// Historical informatics.
2025. № 1.
P. 174-189.
DOI: 10.7256/2585-7797.2025.1.72586 EDN: QCQWHG URL: https://en.nbpublish.com/library_read_article.php?id=72586
Abstract:
The subject of the study is the corpus of children's literature from the collection of the Minusinsk Public Library of the late XIX – early XX century, consisting of 121 works written between 1719 and 1905. These texts are a significant source for studying the formation of geographical perception among residents of a provincial Siberian city through fiction. Special attention is paid to the analysis of geographical names (toponyms) found in texts in order to identify their frequency and geographical distribution. This allows us to reconstruct the picture of the world presented in the books of that time and understand how it was perceived by the children's audience, forming their idea of countries, cities and cultural centers. The research is aimed at studying the role of children's literature as a cultural tool that reflects and forms geographical representations, as well as at identifying methodological challenges and limitations when working with historical buildings. The methodological basis includes bringing pre-reform texts to a machine-readable form using digitization tools and geoparsing to automatically identify geographical entities. The Spacy library was used for the analysis, followed by manual verification and correction of the data. The results of the study include the identification of 668 cities and 97 countries represented in the texts, as well as the construction of a cartographic visualization of the frequency distribution of mentions. The analysis revealed an uneven distribution of geographical names in various texts, where mentions of Russia, Poland and England prevail among countries, and Kiev, Moscow and St. Petersburg among cities. The scope of the results includes research in the field of digital humanities, library science and historical and cultural studies. The novelty of the work lies in the use of modern geoparsing methods for processing Russian-language texts of pre-reform spelling and in the analysis of the previously unexplored literature corpus of the Minusinsk Library. The conclusions emphasize the importance of text mapping for understanding the formation of geographical perception and the need for further development of NER tools for complex corpora. Despite the limitations, the research contributes to the development of NLP methods for historical texts.
Keywords:
Pre-reform orthography, Minusinsk Public Library, Children's literature, World map, Minusinsk, Siberia, Mapping, Named-entity recognition, Historical Computer Science, Geoparsing
Reference:
Yumasheva J.Y..
The possibility of using artificial intelligence in historical research
// Historical informatics.
2025. № 1.
P. 95-121.
DOI: 10.7256/2585-7797.2025.1.73578 EDN: PQTZJT URL: https://en.nbpublish.com/library_read_article.php?id=73578
Abstract:
The article is devoted to the controversial problem of the use of artificial intelligence in historical research. The introduction briefly examines the history of the emergence of "artificial intelligence" (AI) as a field in computer science, the evolution of this definition and views on the application of AI; analyzes the place of artificial intelligence methods at different stages of specific historical research. In the main part of the article, based on the analysis of historiographical sources and his own experience of participating in foreign projects, the author analyzes the practice of implementing handwritten text recognition projects using various information technologies and AI methods, in particular, describes and justifies the requirements for creating electronic copies of recognizable sources, the need to take into account the texture of information carriers, writing materials, techniques and technologies for creating the text; varieties and methods of creating paleographic, codicological, diplomatic datasets, historical and lexicological dictionaries, the possibility of using large language models, etc. As a methodological basis, the author used a systematic approach, historical-comparative, historical-chronological and descriptive methods, as well as the analysis of historiographical sources. In conclusion, it is concluded that the use of artificial intelligence technologies is promising not only as an auxiliary tool, but also as research methods that help in establishing the authorship of historical sources, clarifying their dating, detecting forgeries, etc., as well as in creating new types of scientific reference search systems for archives and libraries. At the same time, the use of artificial intelligence technologies is highly expensive and capital intensive, which is a serious obstacle to the widespread introduction of these technologies into the practice of historical research.
Keywords:
large linguistic models, datasets, historical lexicology, diplomatics, codicology, paleography, automated text recognition, historical sources, artificial intelligence, information technologies
Reference:
Orekhov B.V..
Text and knowledge in the aspect of large language models
// Historical informatics.
2023. № 4.
P. 104-113.
DOI: 10.7256/2585-7797.2023.4.44180 EDN: BJQBQB URL: https://en.nbpublish.com/library_read_article.php?id=44180
Abstract:
The focus of this text is on the influence of large linguistic models on the self-determination of the humanities. Large language models are able to generate plausible texts. It seems that they thus become on a par with other tools that, throughout the development of technology have freed people from routine. At the same time, for the humanities, the individualization of the generated texts is very great, and knowledge itself is closely related to its textual embodiment. If we agree that knowledge is a text, and embodied in another text, another knowledge appears before us, then humanities will have to answer the question of how a text generated by a person differs in value from the same text generated by a machine. The text of the work raises methodological and epistemological problems of the correlation of texts of natural and artificial origin if they are made in the genre of a scientific work. The difference between such artifacts is clearly visible only for some scientific disciplines, and raises questions about the rest. These issues should be resolved with the help of deep reflection, which was not so urgently needed in the last centuries of the development of the humanities, but which is now required from a humanitarian scientist. The humanitarian will have to explicitly oppose himself to large language models and prove the importance of his work compared to what a neural network can generate.
Keywords:
text, the science, knowledge, text generators, methodology of science, scientific publications, chatgpt, large language models, formal languages, Humanities