Seminar – Phong Le, Amazon – 3rd March 2021

Can Language Models be Weak Annotators

We are happy to have Phong Le, from Amazon, talk on Teams on Wed 3 March at 12 noon on Teams.

Abstract

Deep language models e.g. BERT and GPT3 are the breakthrough in Natural Language Processing in the last 3 years. Being trained on massive raw text data, they capture useful priors for several tasks such as syntactic parsing, information extraction, and question answering. Moreover, they are capable of answering factual and commonsense cloze questions such as “Dante was born in _____”. In this talk, I will firstly give an overview about what language models “know”. I will then present our work on exploiting their knowledge as weak supervision for a specific task called relation classification.

Relation classification, the identification of a particular relation type between two entities in text, requires annotated data. Data annotation is either a manual process for supervised learning, or automated, using knowledge bases for distant learning. However, both methodologies are costly and time-consuming since they depend on intensive human labour for annotation or for knowledge base creation. Using language models as annotators, on the contrary, is very cheap but the annotation quality is low. We hence propose NoelA, an auto-encoder using a noisy channel, to improve the accuracy by learning from the low quality annotated data. NoelA outperforms BERT and a bootstrapping baseline on TACRED and reWIKI datasets.

Bio: I’m an applied scientist at Amazon Alexa. Before that, I was a tenure-track research fellow at the University of Manchester. I did a postdoc with Ivan Titov at the University of Edinburgh, and got a PhD from the University of Amsterdam under the supervision of (Jelle) Willem Zuidema. I’m interested in neural networks and deep learning. My current work is to employ them to solve natural language processing tasks such as entity linking, coreference resolution, and dependency parsing. I’m also interested in formal semantics, especially learning semantic parsing.

For more details, please visit my homepage https://sites.google.com/site/lephongxyz/

Please note the session will not be recorded, to preserve the like-for-like nature of physical seminars and also avoid any privacy/rights issues.

Event details

When: 3rd March 2021 12:00 - 3rd February 2021 13:00
Format: Seminar

Computer Science Blog

Computer Science Blog

Seminar – Phong Le, Amazon – 3rd March 2021

Event details