Jie Tan’s Post

View profile for Jie Tan, graphic

Senior Staff Research Scientist at Google

Our latest research on vision-language-action model for robot navigation. With the super long context, semantic and visual understanding, and common sense reasoning, Gemini v1.5 Pro enables the robot to navigate to places where users can specify in intuitive ways (through language, gestures, drawing, etc.)

View organization page for Google DeepMind, graphic

1,062,046 followers

“Hey robot, take me somewhere I can draw?” 🤖 We challenged our helper robots to navigate their way around a busy space - using Gemini 1.5 Pro. With the model’s 1 million token context window, it’s able to recall an environment after watching a video tour, and successfully followed a range of instructions - from finding a specific desk to remembering a favorite drink. Find out more in our latest paper → https://dpmd.ai/4bUobbj

To view or add a comment, sign in

Explore topics