HTRogène

Project type
Exploratory project
Scientific coordinator
Thibault Clérice
Selected in
2023

HTRogen takes on the challenges of producing generic Handwritten Text Recognition (HTR) models to automatically produce transcriptions of documents from the Middle Ages to the Early Modern Period. It focuses on producing transcription data for literary manuscripts and public or private archives in romance languages from the 11th to the 16th century. The main objective of the project is the production of training data and transcription models resistant to changes in scripta and language. HTRogène is built as an infrastructural brick for Biblissima+ and medieval philology of romance languages as it does not focus on a specific or a few specific texts but aims at producing representative samples of transcription data regarding each domain it covers. This sampling occurs along specific criteria such as language, scripta, genres and period.

Illustration d'un manuscrit avec le logo HTRogène