Automated Optical Detection and Recognition of Greek Glyphs on Degraded Papyrus
This work proposes a new, fully automated method for transcribing and annotating degraded ancient papyrus documents which is usually a complex process that requires knowledge of both the language used and the characteristics of papyrus. We suggest a new method that uses digital images of the papyrus to automatically identify and classify each symbol or glyph on the page. Using digital images, which many collections already have, the glyphs are identified on the papyrus, grouped into lines, and then classified, taking into account the context in which the glyph appears. Using this method could allow these documents to be made available to modern, computer-based translation and language analysis tools as well as making them more accessable to everyone.
Keywords
OCR, RCNN, Recurrent CNN, Machine Learning, YOLO, Language Models, Glyph Bounding, Ancient Greek, Papyrus, LSTM
Staff
[Mark Jan Nederhof]{mn31}