Learning hard chart constraints for efficient context-free parsing by Brian Roark – Oregon Health & Science University

Abstract: In this talk, I’ll present some recent work in learning hard constraints for cells within a context-free parsing chart, to reduce parsing time. Each cell in the chart represents one of the O(n^2) substrings of the input string, and characteristics of each substring can be used to decide how much work to do in the associated chart cell. I’ll discuss finite-state models for tagging chart constraints on words, including methods for bounding the worst-case complexity of the parsing pipeline to quadratic or sub-quadratic in the length of the string. Empirical results will be presented for English and Chinese, achieved by constraining various high accuracy parsers.

Finally, I will present a generalization of these finite-state approaches that performs a quadratic number of classifications (one for each substring) to produce further (finer) constraints on the amount of processing within each cell. This latter approach has the nice property of being trained on maximum likelihood parses, rather than reference parses, making for a straightforward method for tuning parsing efficiency to new tasks and domains.

Bio: Brian Roark is an Associate Professor in the Center for Spoken Language Understanding (CSLU) and Dept. of Biomedical Engineering at Oregon Health & Science University (OHSU). He received his PhD from Brown University in 2001 and spent 3 years in the Speech Algorithms Department at AT&T Labs – Research before joining CSLU. His research interests include natural language processing, language modeling for various applications, assistive technology, and spoken language understanding.