Robotic planning with large language models based on visual cues
Today, robotic systems often require precise descriptions of the environment in which they are expected to operate. In other cases, they need time to extensively explore the environment before they are able to operate.
Similarly, tasks given to robots need to be given in strict formats which correspond to the definitions of each given environment.
This project aims to solve both of these problems. Using the knowledge of the real world possessed by large language models (specifically GPT-3) aided by a visual model to convert natural language instructions into plans for a robot. The only prior knowledge about the environment given to the system is a list of objects observed in it. Instructions can then be given in any number of ways understandable by a human, from ‘Put the cup on the table’ to ‘Set the table for tea’. This approach could ultimately allow robotic systems to be put to work in new environments much quicker and moreover, anyone, not just professionally trained technicians, can give instructions to the robot.
Keywords
Artificial Intelligence, Machine Learning, Robotics, Planning, Large Language Models, Computer Vision, Open Vocabulary Object Detection
Staff
[Juan Ye]{jy31}