Google's RubricEM Trains Deep Research Agents Without Verifiable Rewards
Google Cloud AI Research dropped RubricEM on May 11 (arXiv 2605.10899). 57 HuggingFace upvotes, 12 authors including Tomas Pfister. The problem they tackle is one of the harder open ones in agent training: how do you do reinforcement learning on a deep research agent when the outputs are long-form reports and there is no clean answer key to score against?
Their move is to decompose the agent's job into a policy hierarchy, then evolve a meta-policy on top using rubrics instead of binary correct/incorrect rewards. Rubrics give you semantic feedback across long decision sequences β did the agent plan well, did it search the right sources, did it weigh contradictory evidence, did it synthesize coherently β and that signal can actually flow back through gradient updates. Verifiable rewards work great for math and code where you can grade the final answer. They fall apart the moment the task is a 20-page report.
RubricEM-8B beats comparable open models across four long-form research benchmarks and gets close to proprietary deep-research systems. 63 pages, 6 figures, CC-BY license. The frame for clauday readers: the Deep Research category exploded over the last year (OpenAI, Google, Perplexity, Manus, GenSpark, all shipped versions) but nobody publicly talked about how to actually train these things. RubricEM is the first serious open paper that takes the training problem head-on. If anybody is building a research agent, this is the primary reference for the next 6 months.
Paper: https://arxiv.org/abs/2605.10899
← Back to all articles
Their move is to decompose the agent's job into a policy hierarchy, then evolve a meta-policy on top using rubrics instead of binary correct/incorrect rewards. Rubrics give you semantic feedback across long decision sequences β did the agent plan well, did it search the right sources, did it weigh contradictory evidence, did it synthesize coherently β and that signal can actually flow back through gradient updates. Verifiable rewards work great for math and code where you can grade the final answer. They fall apart the moment the task is a 20-page report.
RubricEM-8B beats comparable open models across four long-form research benchmarks and gets close to proprietary deep-research systems. 63 pages, 6 figures, CC-BY license. The frame for clauday readers: the Deep Research category exploded over the last year (OpenAI, Google, Perplexity, Manus, GenSpark, all shipped versions) but nobody publicly talked about how to actually train these things. RubricEM is the first serious open paper that takes the training problem head-on. If anybody is building a research agent, this is the primary reference for the next 6 months.
Paper: https://arxiv.org/abs/2605.10899
Comments