Qwen Just Stepped Out of the Chatbox
Alibaba's Qwen team shipped its first suite of embodied AI models on Tuesday, and the message is hard to miss: the agent is leaving the screen and walking into the physical world. Three models in the Qwen-Robot series. Qwen-RobotNav for navigating physical spaces. Qwen-RobotWorld, a video world model that lets a robot predict how a scene will unfold before it acts. And Qwen-RobotManip, which takes messy data from all kinds of different robots and folds it into one canonical space so you can train across different robot bodies at scale.
That last one is the real bet. The hardest problem in robotics isn't any single skill, it's that every robot is different, different arms, different sensors, different everything, so data collected on one machine barely transfers to another. Qwen-RobotManip is trying to be the universal translator that finally makes cross-embodiment training work. Walk, perceive, and think at the same time, as the Chinese coverage put it. Built by Alibaba's Tongyi Lab, and already in pilot with selected Alibaba Cloud enterprise customers, so this isn't a demo reel.
This is the same playbook China's labs have run all month, just aimed somewhere new. Huawei put an agent into the operating system, Kimi and MiMo put agents into the terminal, and now Alibaba is putting one into a body. The frontier stopped being only about bigger models a while ago. It's now about models that touch reality. If RobotWorld's prediction quality holds up, the long-standing gap between vision-language understanding and actual physical control gets a lot narrower. More here: https://www.scmp.com/tech/big-tech/article/3357260/alibaba-eyes-physical-world-its-first-suite-ai-models-robots
← Back to all articles
That last one is the real bet. The hardest problem in robotics isn't any single skill, it's that every robot is different, different arms, different sensors, different everything, so data collected on one machine barely transfers to another. Qwen-RobotManip is trying to be the universal translator that finally makes cross-embodiment training work. Walk, perceive, and think at the same time, as the Chinese coverage put it. Built by Alibaba's Tongyi Lab, and already in pilot with selected Alibaba Cloud enterprise customers, so this isn't a demo reel.
This is the same playbook China's labs have run all month, just aimed somewhere new. Huawei put an agent into the operating system, Kimi and MiMo put agents into the terminal, and now Alibaba is putting one into a body. The frontier stopped being only about bigger models a while ago. It's now about models that touch reality. If RobotWorld's prediction quality holds up, the long-standing gap between vision-language understanding and actual physical control gets a lot narrower. More here: https://www.scmp.com/tech/big-tech/article/3357260/alibaba-eyes-physical-world-its-first-suite-ai-models-robots
Comments