List of our open source models publicly available
Models | Type | Team |
---|---|---|
OCR model for Armenian : designed for damaged, historical and noisy printed documents |
Text recognition (OCR) |
Calfa |
Lemmatization, POS-tagging, morphological analysis and named entities recognition in Classical Armenian |
Text analysis |
Calfa, Dalih |
Lemmatization, POS-tagging, morphological analysis and named entities recognition in Western Armenian |
Text analysis |
Calfa, Dalih |
Lemmatization, POS-tagging, morphological analysis and named entities recognition in Eastern Armenian |
Text analysis |
Calfa, Dalih |
Free tools for researchers developed by Calfa
Online, collaborative and free. Create your project, invite collaborators, annotate your documents and export the data.
Online, free, easy-to-use detector for illuminations, initials, and seals in manuscripts. Simply paste the IIIF manifest URL of a manuscript or upload your own images, and our models will do the rest.
Open access datasets published by Calfa and partners
Dataset | Lang. | Team |
---|---|---|
HTR ground-truth for Armenian cursive archives (Dulaurier collection - BnF) |
Armenian HTR |
Calfa, BnF Datalab, GREgORI |
OCR ground-truth for noisy and dense printed greek historical documents |
Greek OCR |
Calfa, GREgORI |
HTR ground-truth for Chinese xylographic imperial editions |
Chinese HTR |
Calfa, Collex-Persée |
Recognition and Analysis of Scripts in Arabic Maghrebi |
Arabic HTR |
Calfa, DISTAM |
Ground-truth of the TariMa project (HTR/OCR of Maghrebi Arabic documents). |
Arabic HTR |
Calfa, BULAC, Collex-Persée |
Ground-truth produced during the Alexander Hackathon for the automatic transcription of manuscripts of the Alexander Romance in Middle Arabic. |
Arabic HTR |
Calfa, DISTAM, LiPoL |
Middle-Arabic, modern scripts |
Arabic HTR |
LiPoL, Ifpo, Calfa |
Chahan Vidal-Gorène and Aliénor Decours-Perez and Anahide Kasparian and Ani Tanelian and Agnès Ohanian. Armenian HTR: State of the art, transcription guidelines and good practices. 2025.
Vidal-Gorène, Chahan and Decours-Perez, Aliénor, Detecting and Deciphering Damaged Medieval Armenian Inscriptions Using YOLO and Vision Transformers. In Document Analysis and Recognition -- ICDAR 2024 Workshops, pp. 22--36, Cham, 2024. Springer Nature Switzerland.
PDF BibTeX
Bizais-Lillig, Marie and Vidal-Gorène, Chahan and Dupin, Boris, Optimizing HTR and Reading Order Strategies for Chinese Imperial Editions with Few-Shot Learning. In Document Analysis and Recognition -- ICDAR 2024 Workshops, pp. 37--56, Cham, 2024. Springer Nature Switzerland.
PDF BibTeX
Vidal-Gorène, Chahan and Dupin, Boris and Decours-Perez, Aliénor and Riccioli, Thomas, A modular and automated annotation platform for handwritings: evaluation on under-resourced languages. In International Conference on Document Analysis and Recognition, pp. 507--522, 2021.
Vidal-Gorène, Chahan and Lucas, Noëmie and Salah, Clément and Decours-Perez, Aliénor and Dupin, Boris, RASAM -- A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi. In International Conference on Document Analysis and Recognition, pp. 265--281, 2021.
PDF BibTeX
Vidal-Gorène, Chahan and Salah, Clément and Lucas, Noëmie and Decours-Perez, Aliénor and Perrier, Antoine, Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking. In Computational Humanities Research (CHR), Aarhus, Denmark, 2024.
PDF BibTeX
Vidal-Gorène, Chahan and Lucas, No\"emie and Salah, Clément and Decours-Perez, Aliénor and Dupin, Boris, RASAM -- A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi. In GitHub repository, 2021--2024. GitHub.
Kindt, Bastien and Vidal--Gorène, Chahan, From Manuscript to Tagged Corpora. An Automated Process for Ancient Armenian or Other Under-Resourced Languages of the Christian East. In Armeniaca, pp. 73--96, 2022. Edizioni Ca' Foscari - Digital Publishing, Fondazione Università Ca' Foscari.
PDF BibTeX
Vidal-Gorène, Chahan, La reconnaissance automatique d'écriture à l'épreuve des langues peu dotées. In The Programming Historian en français, 2023. ProgHist Ltd.
PDF
Lucas, Noëmie and Salah, Clément and Vidal-Gorène, Chahan, New Results for the Text Recognition of Arabic Maghribi Manuscripts--Managing an Under-resourced Script. In arXiv preprint arXiv:2211.16147, 2022.
PDF
Vidal-Gorène, Chahan, OCR / HTR technologies and Armenian Heritage Preservation. In Banber Hayastani gradaranneri . Gitamet'odakan handes, pp. 61-65, 2023. National Library of Armenia.
PDF BibTeX
Vidal-Gorène, Chahan and Tomeh, Nadi and Khurshudyan, Victoria, Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pp. 438--449, Miami, USA, 2024. Association for Computational Linguistics.
PDF BibTeX
Vidal-Gorène, Chahan and Khurshudyan, Victoria and Donabédian-Demopoulos, Anaïd, Recycling and Comparing Morphological Annotation Models for Armenian Diachronic-Variational Corpus Processing. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 90--101, Barcelona, Spain (Online), 2020. International Committee on Computational Linguistics (ICCL).
PDF
Vidal-Gorène, Chahan and Kindt, Bastien, Lemmatization and POS-tagging process by using joint learning approach. Experimental results on Classical Armenian, Old Georgian, and Syriac. In Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages, pp. 22--27, Marseille, France, 2020. European Language Resources Association (ELRA).
PDF BibTeX
Kindt, Bastien and Vidal-Gorène, Chahan and Delle Donne, Saulo, Analyse automatique du grec ancien par réseau de neurones. Évaluation sur le corpus De Thessalonica Capta. In Bulletin de l'Académie Belge pour l'Etude des Langues Anciennes et Orientales, pp. 537--562, 2022.
PDF
Vidal-Gorène, Chahan and Decours-Perez, Aliénor and Queuche, Baptiste and Ouzounian, Agnès and Riccioli, Thomas, Digitalization and Enrichment of the Nor Bargirk‘ Haykazean Lezui: Work in Progress for Armenian Lexicography. In Journal of the Society of Armenian Studies, pp. 224-244, 2020. Brepols.
PDF
Vidal-Gorène, Chahan and Decours-Perez, Aliénor, Languages Resources for Poorly Endowed Languages : The Case Study of Classical Armenian. In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3145--3152, Marseille, France, 2020. European Language Resources Association.
PDF
Subscribe to our mailing list to receive news about our projects