Jie Tan’s Post

Name: Jie Tan on LinkedIn: Our latest research on vision-language-action model for robot navigation…
Uploaded: 2024-07-11T23:03:55.152Z
Channel: Jie Tan
Description: Our latest research on vision-language-action model for robot navigation. With the super long context, semantic and visual understanding, and common sense reasoning, Gemini v1.5 Pro enables the robot to navigate to places where users can specify in intuitive ways (through language, gestures, drawing, etc.)

Jie Tan

Senior Staff Research Scientist at Google

2mo Edited

Our latest research on vision-language-action model for robot navigation. With the super long context, semantic and visual understanding, and common sense reasoning, Gemini v1.5 Pro enables the robot to navigate to places where users can specify in intuitive ways (through language, gestures, drawing, etc.)

Google DeepMind

1,062,046 followers

2mo

“Hey robot, take me somewhere I can draw?” 🤖 We challenged our helper robots to navigate their way around a busy space - using Gemini 1.5 Pro. With the model’s 1 million token context window, it’s able to recall an environment after watching a video tour, and successfully followed a range of instructions - from finding a specific desk to remembering a favorite drink. Find out more in our latest paper → https://dpmd.ai/4bUobbj

To view or add a comment, sign in

More Relevant Posts

Isabel Edkins
2mo
Report this post
There's no doubt AI is going to fuel robotics. I love this example from Google DeepMind which shows the power of AI. The video shows a robot remembering a detailed tour of a building and then responds to a query to navigate the space to find an area that is best for drawing - taking the user to a whiteboard. I wish I had something like this for when I forget where I leave my keys! 😂 But in all seriousness, imagine a mini version of this attached to the clothes of someone who has dementia, helping them navigate through spaces and tasks, it could transform their quality of life! #ai #innovation #robotics #googledeepmind

Google DeepMind

1,062,046 followers
2mo

“Hey robot, take me somewhere I can draw?” 🤖 We challenged our helper robots to navigate their way around a busy space - using Gemini 1.5 Pro. With the model’s 1 million token context window, it’s able to recall an environment after watching a video tour, and successfully followed a range of instructions - from finding a specific desk to remembering a favorite drink. Find out more in our latest paper → https://dpmd.ai/4bUobbj
Like Comment
To view or add a comment, sign in
Muchiu (Henry) Chang, PhD. Cantab (Cambridge, UK)

Consultant in Patent Intelligence and Engineering Management
2mo
Report this post
Google DeepMind How long can its battery last? Two universal and basic elements to make the machine work are: battery (electricity) and metadata. Without battery, the machine won't move. Also, battery accounts for 40% of the total cost of an electric machine, e.g., electric car. For AI applications, the electricity bills of a C-level friend of ours has doubled because of using AI. Metadata is an enabler. It's a digital content, NOT a technology. It's like a treasure map, curated by human for treasure hunting. Without metadata, NO data can be found/retrieved, even by the most advanced technologies, like AI, quantum computers, etc. Thus, the machine won't be able to do even a simple task like to move and to sense the situation. https://lnkd.in/g-aJFnXR We are using our expertise/IP, a copyrighted multilingual metadata, to do what AI, like ChatGPT, can't do in data analytics NOW. Do you or any contacts of yours need our expertise/IP? Thanks. 關於 AI 人工智能的一些事實 (Chinese version): https://lnkd.in/eeVbdcza Some fact findings about AI (English version): https://lnkd.in/gwcPNUPP

Google DeepMind

1,062,046 followers
2mo

“Hey robot, take me somewhere I can draw?” 🤖 We challenged our helper robots to navigate their way around a busy space - using Gemini 1.5 Pro. With the model’s 1 million token context window, it’s able to recall an environment after watching a video tour, and successfully followed a range of instructions - from finding a specific desk to remembering a favorite drink. Find out more in our latest paper → https://dpmd.ai/4bUobbj
Like Comment
To view or add a comment, sign in
Zipeng Fu

Stanford AI & Robotics PhD
2mo Edited
Report this post
Introduce Mobility VLA - Google's foundation model for navigation - started as my intern project: - Gemini 1.5 Pro for high-level image & text understanding - topological graphs for low-level navigation - supports multimodal instructions co-lead Zhuo Xu, Lewis Chiang, Jie Tan

Google DeepMind

1,062,046 followers
2mo

“Hey robot, take me somewhere I can draw?” 🤖 We challenged our helper robots to navigate their way around a busy space - using Gemini 1.5 Pro. With the model’s 1 million token context window, it’s able to recall an environment after watching a video tour, and successfully followed a range of instructions - from finding a specific desk to remembering a favorite drink. Find out more in our latest paper → https://dpmd.ai/4bUobbj

2 Comments
Like Comment
To view or add a comment, sign in
TexoBot

32 followers
2mo
Report this post
Discover Advanced Robot Navigation with Gemini 1.5 Pro The Gemini 1.5 Pro model, with its impressive 1 million token context window, has redefined robot navigation. This advanced system enables robots to recall detailed environments from video tours and follow complex instructions, from finding a specific desk to remembering favorite drinks. Learn more about how Vision-Language Models and topological graphs make this possible in the latest research paper: ["Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs"](https://dpmd.ai/4bUobbj). #AI #Robotics #Innovation #TechResearch #MachineLearning

Google DeepMind

1,062,046 followers
2mo

“Hey robot, take me somewhere I can draw?” 🤖 We challenged our helper robots to navigate their way around a busy space - using Gemini 1.5 Pro. With the model’s 1 million token context window, it’s able to recall an environment after watching a video tour, and successfully followed a range of instructions - from finding a specific desk to remembering a favorite drink. Find out more in our latest paper → https://dpmd.ai/4bUobbj
Like Comment
To view or add a comment, sign in
Fasih ullah

Co-founder at Voliom | Empowering Startups by Scaling Software Development Teams | Custom Software Development AI ML DevOps Cloud Big Data
1mo
Report this post
Check out Google DeepMind Gemini 1.5 Pro model, with its 1 million token context window, can navigate busy environments by recalling details from a video tour and following complex instructions. Whether it's finding a specific desk or remembering a favorite drink, this technology is pushing the boundaries of what's possible. #google #ai #robotic

Google DeepMind

1,062,046 followers
2mo

“Hey robot, take me somewhere I can draw?” 🤖 We challenged our helper robots to navigate their way around a busy space - using Gemini 1.5 Pro. With the model’s 1 million token context window, it’s able to recall an environment after watching a video tour, and successfully followed a range of instructions - from finding a specific desk to remembering a favorite drink. Find out more in our latest paper → https://dpmd.ai/4bUobbj
Like Comment
To view or add a comment, sign in
Alpha Robotics Co.,Ltd

648 followers
8mo
Report this post
Introduction to Jupiter service robot，it can perform face recognition, interact with people through voice, and guide visitors to designated locations to provide fixed-point explanations.
Like Comment
To view or add a comment, sign in
Ardalan Tajbakhsh

Roboticist at the AI Institute / PhD Candidate at Carnegie Mellon University
7mo Edited
Report this post
New Atlas video from Boston Dynamics looks very impressive ! A couple things that caught my attention: 1. Compared to other humanoid demos (mostly teleoperated), the robot is much faster at walking with the load and balancing the body during manipulation. In my opinion, this where Boston Dynamics is ahead from the competition. Most of the humanoid demos thus far demonstrated dexterous manipulation while standing still or moving very slowly. 2. The hand design looks quite different from the competition. While others have opted for human-like hands with many degrees of freedom, BD has opted for a more functional and robust design with 3 fingers. This should be sufficient for most tasks in logistics and manufacturing that require handling rigid and heavy objects. Would love to hear your thoughts on the robot in comments below.

Boston Dynamics

532,420 followers
7mo

Can't trip Atlas up! Our humanoid robot gets ready for real work combining strength, perception, and mobility.

Atlas Struts

2 Comments
Like Comment
To view or add a comment, sign in
Scott Radford

Engineer | Technical Writer | Robot Reveler
7mo
Report this post
Seeing the geometries overlaid in real time with the world shows how smart our system is about what we're handling, how it affects our dynamics, and how we calculate the perfect poses for grasping, moving, and placing this rather heavy strut. Absolutely amazing demonstration of what humanoid robots can do for the future of manufacturing. Way to go Atlas! 💪

Boston Dynamics

532,420 followers
7mo

Can't trip Atlas up! Our humanoid robot gets ready for real work combining strength, perception, and mobility.

Atlas Struts
Like Comment
To view or add a comment, sign in
Saïd C.

Entrepreneur & Investor | Servant Leader | Business Dev. Expert | Talent Magnet | BPO/ITO/Customer Service Expert | Proud Jadara Foundation Volunteer | Music Producer | Culture & Art Activist 🇲🇦🇲🇦🇲🇦
7mo
Report this post
This morning, I was captivated by this video showcasing a humanoid robot from Boston Dynamics, effortlessly demonstrating strength and precision. It's moments like these that remind me of the incredible pace of innovation we're experiencing in 2024. As I reflect on these advancements, I can't help but acknowledge the mixed emotions they evoke : On one hand, there's the awe-inspiring potential to enhance efficiency and improve lives. Yet, there are also concerns about job displacement and #cybersecurity threats. However, amidst these challenges there are tremendous opportunities, I'm optimistic about our ability to #adapt, #learn, and #collaborate. By embracing change and fostering ethical practices, we can navigate the complexities of this era while creating a world that's not only technologically advanced but also compassionate and inclusive. #ai #humanoid #bostondynamics #robots #TechRevolution #Innovation #FutureForward

Boston Dynamics

532,420 followers
7mo

Can't trip Atlas up! Our humanoid robot gets ready for real work combining strength, perception, and mobility.

Atlas Struts
Like Comment
To view or add a comment, sign in
max I.

Computer Vision Engineer | Machine Learning Engineer | Data Scientist, T-shaped
7mo
Report this post
Looks like Atlas knows this particular thing and hence knows how to grab that thing and put it into place. Can Atlas handle arbitrary objects? Anyway, the way it makes steps is more impressive to me for some reason.

Boston Dynamics

532,420 followers
7mo

Can't trip Atlas up! Our humanoid robot gets ready for real work combining strength, perception, and mobility.

Atlas Struts
Like Comment
To view or add a comment, sign in

728 followers

4 Posts

View Profile Follow

Jie Tan’s Post

More Relevant Posts

Atlas Struts

Atlas Struts

Atlas Struts

Atlas Struts

Explore topics