NLP
In order to contribute to the formation of Digital Assyriology as an emerging research field in Belgium and internationally, CUNE-IIIF-ORM will experiment with the semi-automatic analysis, using Machine Learning, and in particular Natural Language Processing (NLP). This will be undertaken using the KMKG/MRAH Old Babylonian Text Corpus transcribed and enriched with further Old Babylonian texts from collections worldwide. For this, two case studies will be selected: a) Old Babylonian administrative documents (e.g. sales contracts, loans etc.) and b) Old Babylonian Letters. For the first case study, a linguistic preprocessing pipeline will be developed in order to automatically annotate Akkadian text with Part-of-Speech and morphological information. Next, Named Entity Recognition techniques will be used to automatically identify and extract persons and locations from Old Babylonia text corpus and classify them into a predefined set of categories. For the second case study, the corpus of 13 letters in the
KMKG/MRAH corpus will be transcribed and extended to 3000 by making use of the well-documented corpus of Old Babylonian letters published in the 14 volumes series Altbabylonische Briefe in Umschrift und Übersetzung Veenhof (ed.) 1964-2005. The UGent LT3 team will then use (1) automatic term extraction, to extract the important terms from the corpus, and (2) distributional semantic analysis to cluster terms that are semantically related, thus resulting in the main topics mentioned in the letters. This sub-corpus will be further enriched and contextualised with cuneiform tablets from other Old Babylonian collections worldwide (e.g. Cuneiform Digital Library Initiative (CDLI ), Ashmolean Museum, British Museum, Louvre, Penn Museum Babylonian Section, Pergamonmuseum, Yale University Babylonian Collection etc.) on the basis of prosopographical research and literature search.