Robotic planning with large language models based on visual cues

Ruth Hoffmann
Thursday 2 March 2023

Today, robotic systems often require precise descriptions of the environment in which they are expected to operate. In other cases, they need time to extensively explore the environment  before they are able to operate.

Similarly, tasks given to robots need to be given in strict formats which correspond to the definitions of each given environment.

This project aims to solve both of these problems. Using the knowledge of the real world possessed by large language models (specifically GPT-3) aided by a visual model to convert natural language instructions into plans for a robot. The only prior knowledge about the environment given to the system is a list of objects observed in it. Instructions can then be given in any number of ways understandable by a human, from ‘Put the cup on the table’ to ‘Set the table for tea’. This approach could ultimately allow  robotic systems  to be put to work in  new environments much quicker and moreover, anyone, not just professionally trained technicians, can give instructions to the robot.

Keywords

Artificial Intelligence, Machine Learning, Robotics, Planning, Large Language Models, Computer Vision, Open Vocabulary Object Detection

Staff

[Juan Ye]{jy31}

Related topics

Share this story