Skip to content Skip to footer

The Latest News in AI & Robotics: RT-2, a Transformer-based Model for Robotic Actions

The robotics industry is rapidly evolving, and keeping up with the latest developments can be challenging. However, there are several resources available for those interested in staying informed about the latest news, trends, and insights in the field. In this post, we will explore the latest news in AI & Robotics, focusing on Google DeepMind’s new AI model, RT-2, which translates vision and language into robotic actions.

RT-2 is a first-of-its-kind vision-language-action (VLA) model that can directly output robotic actions. The Transformer-based model is trained on text and images from the web, enabling it to transfer knowledge from web data to inform robot behaviour. In other words, RT-2 can “speak robot”, making it a significant advancement in robotics that brings us closer to a future of helpful robots.

The pursuit of helpful robots has always been a herculean effort, as robots need to be able to handle complex, abstract tasks in highly variable environments, especially ones they’ve never seen before. Unlike chatbots, robots need “grounding” in the real world and their abilities, requiring them to recognise objects in context, distinguish them from similar objects, and understand how to interact with them.

Recent work has improved robots’ ability to reason, enabling them to use chain-of-thought prompting to dissect multi-step problems. The introduction of vision models, like PaLM-E, has also helped robots make better sense of their surroundings. However, until now, robots have relied on complex stacks of systems, with high-level reasoning and low-level manipulation systems playing an imperfect game of telephone to operate the robot.

RT-2 removes this complexity by enabling a single model to perform complex reasoning and output robot actions. The system is able to transfer concepts embedded in its language and vision training data to direct robot actions, even for abstract tasks like identifying and disposing of rubbish. In testing RT-2 models in more than 6,000 robotic trials, the team found that RT-2 functioned as well as our previous model, RT-1, on tasks in its training data, or “seen” tasks. And it almost doubled its performance on novel, unseen scenarios to 62% from RT-1’s 32%.

RT-2’s ability to transfer information to actions shows promise for robots to more rapidly adapt to novel situations and learn more like we do, transferring learned concepts to new situations. While there is still a tremendous amount of work to be done to enable helpful robots in human-centred environments, RT-2 shows us an exciting future for robotics just within grasp.

To stay informed about the latest news, trends, and insights in the robotics industry, we recommend following the top 6 robotics blogs for industry news, trends, and insights, including Robohub, Robotics.org, RoboGlobal News, Robotics Business Review, The Robot Report, and Robotics Industry News, Applications, and Trends at Robotiq. These blogs provide timely information presented thoughtfully by knowledgeable experts, featuring a variety of perspectives on robotics research, start-ups, business, and education.

In conclusion, RT-2 is a significant advancement in robotics that brings us closer to a future of helpful robots. By enabling a single model to perform complex reasoning and output robot actions, RT-2 shows promise for robots to more rapidly adapt to novel situations and learn more like we do. To stay informed about the latest news, trends, and insights in the robotics industry, we recommend following the top 6 robotics blogs for industry news, trends, and insights.

Leave a comment

0.0/5