Analysis of early printed editions

This project is in conjunction with Dr. Clive Sneddon from the St Andrews
Institute of Mediaeval Studies. The task is to develop data formats and
algorithms to analyse digitisations of early printed French bibles, obtained
via OCR.

Early French bibles pose a number of challenges for computer processing. For
particular editions, there is the problem that many characters do not occur in
modern character sets; these characters could be specific forms of letters, or
punctuation or abbreviations. This requires non-standard text representations
that faithfully represent salient aspects of the original manuscripts, and at
the same time allow a sufficiently abstract view of the texts to allow, for
example, text search. For different editions, the challenge is to compare
them semi-automatically using, for example, alignment algorithms, and to
visualise the differences.

A successful realisation of this project will involve such topics as Unicode,
font technology, XML (TEI/Epidoc), automatic alignment, and information
retrieval. Ideally, the project will also touch upon OCR, to investigate how
suitable digitisations can be obtained from scans of original manuscripts.

Supervisors

Artefact(s)

Software, XML schemas.