Reconstruction of 16th century printing technology

In collaboration with the School of History, we have applied OCR to printed French bibles from the 16th century. We have observed that spaces between words come in several sizes. Closely linked words can have smaller spaces between them than other words. Modern OCR technology treats all spaces as equal. We aim to automatically reconstruct […]

Continue reading

Annotation of 3D models

An existing tool annotates 3D models with polygons and labels. It makes use of the LibGDX Java library, which was designed for computer games. The 3D models that we are using, of Ancient Egyptian coffins, are becoming too big however to be handled by this library. The task would be to redesign the tool, but […]

Continue reading

An aid to learning to read foreign languages

Machine translation is getting good enough so we read an online newspaper in a foreign language, e.g. using Google Translate. But if we read texts online for the purpose of practicing our foreign language reading skills, then we may wish to see more than just the translations. It would be nice if we could see […]

Continue reading

OCR using Transkribus

A joint project with the St Andrews Institute of Mediaeval Studies attempts to digitise mediaeval documents using OCR (Optical Character Recognition). Previous attempts with Ocropus gave good results on printed texts, but mixed results on manuscripts. In this project, Transkribus will be used. SupervisorsMark-Jan NederhofArtefact(s) Various scripts will be written to make the scans amenable […]

Continue reading

Removing noise from hand-written transcriptions

An OCR tool developed in the school can recognise hand-written transcriptions of Ancient Egyptian. One of the remaining obstacles is ‘hatching’, that is, diagonal lines drawn to indicate that the text on the original artefact was damaged. This interferes with the segmentation into individual signs. The purpose of the project is to experiment with image […]

Continue reading

Interlinear text on web pages

Interlinear text consists of several levels of annotation of a text, such as translations or transliterations. An package developed in the school includes functionality to format interlinear text in a Java applet, in such a way that the layout adapts dynamically to the width of the window. Regrettably, Java applets are used less and less […]

Continue reading

Input methods for complex scripts

A new kind of encoding of Ancient Egyptian hieroglyphic text in Unicode requires several control characters to form groups of signs. Typing such control characters together with the visible signs is confusing for users, as the end result cannot be displayed until all the characters have been entered, and the control characters themselves are normally […]

Continue reading

Analysis of early printed editions

This project is in conjunction with Dr. Clive Sneddon from the St Andrews Institute of Mediaeval Studies. The task is to develop data formats and algorithms to analyse digitisations of early printed French bibles, obtained via OCR. Early French bibles pose a number of challenges for computer processing. For particular editions, there is the problem […]

Continue reading

Ancient Egyptian hieroglyphs in OpenType

A Unicode proposal involving Ancient Egyptian hieroglyphs has explored the principles of implementation in OpenType. The task of the project would be to implement a full prototype font for Ancient Egyptian hieroglyphic text. The project would involve Python and font tools such as FontForge and AFDKO. Programming experience is required, although the `programming’ that is […]

Continue reading

A graphical editor for analyses of textual artefacts

A tool developed in the school allows building of web applications that show analyses of ancient textual objects. Currently, the input to the tool consists of a number of translations and meta information, and an XML file. This XML file has to be hand-written. The task of the project is to streamline the process of […]

Continue reading