Qwen-AgentWorld: Train the Agent in a Dream
Here's a clean idea from Qwen. Training agents in the real world is slow, fragile and expensive, you need actual environments, actual websites, actual apps that break. So instead, build a model that simulates the environment, and let the agent practice inside the simulation. A dream world, basically, where you can spin up thousands of fake-but-realistic environments and run reinforcement learning cheaply.
They're calling these language world models, and they shipped two: a 35B-A3B and a big 397B-A17B. Trained on more than 10 million interaction trajectories across seven domains, through a three-stage pipeline that first injects general capability, then teaches it to predict the next state of an environment, then sharpens simulation fidelity with RL. Two ways to use it: as a standalone simulator to generate cheap training environments, or as a foundation model where the world-model training acts as a warmup that just makes the downstream agent better.
The result that's worth flagging: on their AgentWorldBench, world-model warmup beat training in the real environment alone. Read that twice. Practicing in the dream produced a better agent than practicing in reality, because the dream gives you volume and control that reality can't.
This is the same trick that quietly powers a lot of robotics and game AI, now pointed squarely at general agents, and the code is open at https://github.com/QwenLM/Qwen-AgentWorld . If simulated environments keep closing the gap with real ones, the bottleneck on agent training stops being data collection and starts being how good your dream is.
← Back to all articles
They're calling these language world models, and they shipped two: a 35B-A3B and a big 397B-A17B. Trained on more than 10 million interaction trajectories across seven domains, through a three-stage pipeline that first injects general capability, then teaches it to predict the next state of an environment, then sharpens simulation fidelity with RL. Two ways to use it: as a standalone simulator to generate cheap training environments, or as a foundation model where the world-model training acts as a warmup that just makes the downstream agent better.
The result that's worth flagging: on their AgentWorldBench, world-model warmup beat training in the real environment alone. Read that twice. Practicing in the dream produced a better agent than practicing in reality, because the dream gives you volume and control that reality can't.
This is the same trick that quietly powers a lot of robotics and game AI, now pointed squarely at general agents, and the code is open at https://github.com/QwenLM/Qwen-AgentWorld . If simulated environments keep closing the gap with real ones, the bottleneck on agent training stops being data collection and starts being how good your dream is.
Comments