Reconstruction of 16th century printing technology

In collaboration with the School of History, we have applied OCR to printed French bibles from the 16th century. We have observed that spaces between words come in several sizes. Closely linked words can have smaller spaces between them than other words. Modern OCR technology treats all spaces as equal. We aim to automatically reconstruct […]

Continue reading

Lifelong learning in human activity recognition with evolvable NLP techniques

This project visions to turn sensor-based human activity recognition as a NLP (natural language processing) problem; where we learn an activity pattern as a sentence that is constructed by a sequence of sensor events. With this, we can predict the next sensor event based on observed events and also more importantly we can perform lifelong […]

Continue reading

An aid to learning to read foreign languages

Machine translation is getting good enough so we read an online newspaper in a foreign language, e.g. using Google Translate. But if we read texts online for the purpose of practicing our foreign language reading skills, then we may wish to see more than just the translations. It would be nice if we could see […]

Continue reading

OCR using Transkribus

A joint project with the St Andrews Institute of Mediaeval Studies attempts to digitise mediaeval documents using OCR (Optical Character Recognition). Previous attempts with Ocropus gave good results on printed texts, but mixed results on manuscripts. In this project, Transkribus will be used. SupervisorsMark-Jan NederhofArtefact(s) Various scripts will be written to make the scans amenable […]

Continue reading

Interlinear text on web pages

Interlinear text consists of several levels of annotation of a text, such as translations or transliterations. An package developed in the school includes functionality to format interlinear text in a Java applet, in such a way that the layout adapts dynamically to the width of the window. Regrettably, Java applets are used less and less […]

Continue reading

Input methods for complex scripts

A new kind of encoding of Ancient Egyptian hieroglyphic text in Unicode requires several control characters to form groups of signs. Typing such control characters together with the visible signs is confusing for users, as the end result cannot be displayed until all the characters have been entered, and the control characters themselves are normally […]

Continue reading

Mapping Clinical Texts to Formal Representations

Clinical texts are often written in free narrative text, ungrammatical, concise phrases with limited context, and heavily exploit acronyms and abbreviations. These factors introduce ambiguity and hide information that can be useful or even critical. This project explores the process of automatically mappingĀ  clinical texts into formal representations. As a case study, we use examples […]

Continue reading