Reverse Engineering Kennicott (REK) strives to combine Handwritten Text Recognition (HTR), Natural Language Processing (NLP) and datamining of image and catalogue data to interlink one of the most important works in the study of the text of the Hebrew Bible (Kennicott’s Vetus Testamentum hebraicum cum variis lectionibus, 1776–1780) with the union catalogue and image database of the world’s manuscripts in Hebrew script at the National Library of Israel, KTIV.
Kennicott noted complete variae lectiones of 250 manuscripts and over 50 printed editions and partial v.l. for an additional 350 mss. We will apply an existing pipeline able to recreate complete manuscript transcriptions by reverse engineering Kennicott’s critical apparatus aligning it to HTR results of some of these manuscripts achieved with eScriptorium.
Working on a selection of manuscripts, this will allow us to create massive amounts of highly accurate (>99%) new training data and HTR models with very low human costs. It will also add the possibility to visually check the vocalization absent from Kennicott.