Antoine d’Abbadie is a 19th-century French explorer, geographer, linguist and member of the French Academy of Sciences. Between 1837 and 1848, he traveled in the Horn of Africa - a very little-known region from the occidental scientific point of view at the time - and brought back in his field notebooks a wealth of information and considerations on the different languages and dialects, the geography and the population he encountered. These handwritten notebooks have been preserved at the Bibliothèque Nationale de France. Today, they are digitized and available on BNF’s digital library Gallica.
In order to enable the study and the edition of these notebooks, Anaïs Wion, CNRS researcher and member of the IMAF (Institut des Mondes Africains) called upon Calfa to provide the recognition of the 3,000 pages text held within the handwritten notebooks.
The highly complex layout of this rich tangle of different languages - Ethiopian alphasyllabary, Latin scripts with numerous diacritical signs in Arabic, Hebrew, or Greek - was a challenge for the recognition of these handwritten texts. Thus, it required the development of a specialized OCR/ HTR.
Calfa developed a threefold recognition model for this project: for Latin, for Ethiopian and for the composite script specific to these notebooks. Depending on the page, one of the three is used.
After months of development and processing the documents, the rich handwritten content of the notebooks has been successfully extracted. The proofreading of the text is now pending on the Transcrire platform, paving the way for the edition work to come.