Loop Daily: June 19, 2026
If there's a single story today, it's autoresearch crossing from bits into atoms. NVIDIA's ENPIRE dominated the conversation—fleet agentic autoresearch let loose on physical robots, with the hardest engineering being everything you set up before pressing Enter: two-layer safety harnesses so 8 robots can run unattended overnight, and frozen reward definitions so the fleet can't game itself. Alongside it, an AutoResearch agent autonomously planned GPU experiments and ran real RL on a 285B model with zero human intervention, and someone burned two weeks of 24/7 token spend to push a local inference rig from 14 to over 100 tok/s. The other half of the day is the quiet consensus on what actually makes a loop work: it's not the loop, it's the harness underneath it—the subagents, the deterministic hooks, the state file that lets the next run resume instead of restart. And the loop keeps escaping software entirely, into tax prep, solar-panel optimization, and robotics.
#1
@victor207755822
https://x.com/victor207755822/status/2067259098584985954
The single strongest autoresearch case of the day. They open-sourced the Deli AutoResearch skill and, for the first time, had their AutoResearch Agent autonomously plan GPU experiments and submit real RL (GRPO) runs on the DeepSeek 285B model. The entire RL pipeline—experiment design, code writing, running, debugging, conclusion summarization—was 100% automated with zero human intervention. This is the token-equals-intelligence thesis at the frontier: an agent burning serious compute to run actual research-grade RL on a 285B model and write up what it found, by itself.
https://x.com/victor207755822/status/2067259098584985954
The single strongest autoresearch case of the day. They open-sourced the Deli AutoResearch skill and, for the first time, had their AutoResearch Agent autonomously plan GPU experiments and submit real RL (GRPO) runs on the DeepSeek 285B model. The entire RL pipeline—experiment design, code writing, running, debugging, conclusion summarization—was 100% automated with zero human intervention. This is the token-equals-intelligence thesis at the frontier: an agent burning serious compute to run actual research-grade RL on a 285B model and write up what it found, by itself.
#2
@DrJimFan
https://x.com/DrJimFan/status/2067283904986517866
The behind-the-scenes engineering tour of physical autoresearch, and the hard part is everything before you press Enter. Letting 8 robots run unattended overnight means safety can't be a hint in the system prompt, so ENPIRE hardwires it in two layers: a hard kinematic limit that trips immediate task failure and auto-resets when a robot leaves its envelope, plus a torque-limited compliant gripper for bad contacts. The reward and 'done' definition is frozen—they collect demos, have an agent write CV classifiers, hill-climb them against groundtruth, then lock them so the fleet can't game its own reward. This is autoresearch crossing from bits into atoms.
https://x.com/DrJimFan/status/2067283904986517866
The behind-the-scenes engineering tour of physical autoresearch, and the hard part is everything before you press Enter. Letting 8 robots run unattended overnight means safety can't be a hint in the system prompt, so ENPIRE hardwires it in two layers: a hard kinematic limit that trips immediate task failure and auto-resets when a robot leaves its envelope, plus a torque-limited compliant gripper for bad contacts. The reward and 'done' definition is frozen—they collect demos, have an agent write CV classifiers, hill-climb them against groundtruth, then lock them so the fleet can't game its own reward. This is autoresearch crossing from bits into atoms.
#3
@GuanyaShi
https://x.com/GuanyaShi/status/2067077863061533172
The clearest methodology statement of the day. Strip the buzzwords—recursive self-improvement, autoresearch, agents that get better by iterating—and underneath is one pattern: once a domain has a repeatable feedback loop, agents can propose, test, observe and revise. That's why games, ML experiments, GPU kernels, codebases and proof search are increasingly agent-solvable. Robotics is where the story breaks: in the digital world 'run the experiment' is a command, but a physical rollout means resetting the scene, executing safely, verifying the outcome, and refining. ENPIRE's whole bet is building that physical feedback loop.
https://x.com/GuanyaShi/status/2067077863061533172
The clearest methodology statement of the day. Strip the buzzwords—recursive self-improvement, autoresearch, agents that get better by iterating—and underneath is one pattern: once a domain has a repeatable feedback loop, agents can propose, test, observe and revise. That's why games, ML experiments, GPU kernels, codebases and proof search are increasingly agent-solvable. Robotics is where the story breaks: in the digital world 'run the experiment' is a command, but a physical rollout means resetting the scene, executing safely, verifying the outcome, and refining. ENPIRE's whole bet is building that physical feedback loop.
#4
@letian_fu
https://x.com/letian_fu/status/2067132813108007279
The ENPIRE launch itself: fleet agentic autoresearch meeting the physical world. Across precise manipulation tasks, teams of coding agents autonomously hill-climb performance using heuristic learning, behavior cloning and reinforcement learning. But the framing that matters is that the coding agents drive the entire research loop, not just algorithm search—reviewing the literature, proposing algorithms, building the reset and verification mechanisms, designing rewards, improving training infrastructure, and running real-world experiments. AGI building physical AGI, with the loop closed end to end.
https://x.com/letian_fu/status/2067132813108007279
The ENPIRE launch itself: fleet agentic autoresearch meeting the physical world. Across precise manipulation tasks, teams of coding agents autonomously hill-climb performance using heuristic learning, behavior cloning and reinforcement learning. But the framing that matters is that the coding agents drive the entire research loop, not just algorithm search—reviewing the literature, proposing algorithms, building the reset and verification mechanisms, designing rewards, improving training infrastructure, and running real-world experiments. AGI building physical AGI, with the loop closed end to end.
#5
@chris_j_paxton
https://x.com/chris_j_paxton/status/2067072289221533828
A one-line distillation that lands: autoresearch via code-as-policies. An LLM agent writes code and tests it directly on real robots—because robots are just software in the real world after all. It captures the whole closed-loop write-test-iterate methodology for physical autoresearch in a single sentence, and it's the conceptual hinge the whole ENPIRE-style approach turns on: if your policy is code, then improving the robot is the same write-test-revise loop that already works for software.
https://x.com/chris_j_paxton/status/2067072289221533828
A one-line distillation that lands: autoresearch via code-as-policies. An LLM agent writes code and tests it directly on real robots—because robots are just software in the real world after all. It captures the whole closed-loop write-test-iterate methodology for physical autoresearch in a single sentence, and it's the conceptual hinge the whole ENPIRE-style approach turns on: if your policy is code, then improving the robot is the same write-test-revise loop that already works for software.
#6
@antiochrobotics
https://x.com/antiochrobotics/status/2067265908012155065
A second team staking the same ground from a different angle: autoresearch as the future of physical autonomy. They're building the simulation layer that lets agents iterate on the full robotic stack in a closed loop. Where ENPIRE runs a physical fleet overnight, this bets on sim as the substrate where the propose-test-verify-refine loop can run cheaply and at scale before anything touches hardware. Two independent groups converging on 'close the loop for robotics' in the same day is the signal worth noting.
https://x.com/antiochrobotics/status/2067265908012155065
A second team staking the same ground from a different angle: autoresearch as the future of physical autonomy. They're building the simulation layer that lets agents iterate on the full robotic stack in a closed loop. Where ENPIRE runs a physical fleet overnight, this bets on sim as the substrate where the propose-test-verify-refine loop can run cheaply and at scale before anything touches hardware. Two independent groups converging on 'close the loop for robotics' in the same day is the signal worth noting.
#7
@askalphaxiv
https://x.com/askalphaxiv/status/2067271046517154035
A genuinely useful autoresearch tool shipped, not just discussed. alphaXiv now deploys autoresearch agents to ingest popular arXiv repos, resolve their notoriously painful setup and dependency issues, and get the paper's core claim actually running—so you can sort papers by ease of implementation. This is autoresearch pointed at reproducibility, the unglamorous bottleneck that wastes more researcher-hours than almost anything. An agent that turns 'the code is on GitHub' into 'the claim runs' is quietly valuable.
https://x.com/askalphaxiv/status/2067271046517154035
A genuinely useful autoresearch tool shipped, not just discussed. alphaXiv now deploys autoresearch agents to ingest popular arXiv repos, resolve their notoriously painful setup and dependency issues, and get the paper's core claim actually running—so you can sort papers by ease of implementation. This is autoresearch pointed at reproducibility, the unglamorous bottleneck that wastes more researcher-hours than almost anything. An agent that turns 'the code is on GitHub' into 'the claim runs' is quietly valuable.
#8
@justALEXWORTEGA
https://x.com/justALEXWORTEGA/status/2067222840701591703
A concrete autoresearch-loop result with receipts. He trained Qwen-35B-A3 with PPO using a verifiable reward (the whole trick), then ran it through Karpathy's autoresearch + parameter-golf loop, where he says it beats GLM-5.2 and Qwen-350B and generates Opus-tier ideas, and tops NEX and GPT-5.5 on a 'bullshit-bench.' Model and GGUF are released with a live ZeroGPU demo. The interesting part isn't the leaderboard bragging—it's a small model put through an iterative self-improvement loop coming out punching above its weight.
https://x.com/justALEXWORTEGA/status/2067222840701591703
A concrete autoresearch-loop result with receipts. He trained Qwen-35B-A3 with PPO using a verifiable reward (the whole trick), then ran it through Karpathy's autoresearch + parameter-golf loop, where he says it beats GLM-5.2 and Qwen-350B and generates Opus-tier ideas, and tops NEX and GPT-5.5 on a 'bullshit-bench.' Model and GGUF are released with a live ZeroGPU demo. The interesting part isn't the leaderboard bragging—it's a small model put through an iterative self-improvement loop coming out punching above its weight.
#9
@MTSlive
https://x.com/MTSlive/status/2067298871144009801
The non-coding autoresearch case with the cleanest before/after. OpenAI's Arthur Fernandes and John de Wasseige describe a self-improving agent (Codex) taking over tax preparation: returns that used to take preparers about 8 hours now take roughly 30 minutes. The agent extracts and groups complex data from many PDFs, Excel sheets and handwritten notes, runs the calculations, and cross-checks values—freeing reviewers to focus only on the genuinely hard fields. A 16x compression on a high-skill, high-liability professional task is exactly the kind of value the loop is supposed to unlock.
https://x.com/MTSlive/status/2067298871144009801
The non-coding autoresearch case with the cleanest before/after. OpenAI's Arthur Fernandes and John de Wasseige describe a self-improving agent (Codex) taking over tax preparation: returns that used to take preparers about 8 hours now take roughly 30 minutes. The agent extracts and groups complex data from many PDFs, Excel sheets and handwritten notes, runs the calculations, and cross-checks values—freeing reviewers to focus only on the genuinely hard fields. A 16x compression on a high-skill, high-liability professional task is exactly the kind of value the loop is supposed to unlock.
#10
@xyster
https://x.com/xyster/status/2067305659675377800
The purest 100X-tokens data point in the batch. He used GPT-5.5 in a 24/7 auto-research loop to take 4x Intel B70s running Minimax m2.7 from 14 tok/s to over 100 tok/s decode rate—a 7x gain—and he's explicit about the cost: 'It took two weeks of 24/7 auto research. That's a lot of tokens!!' He even benchmarks the loop itself, noting Fable and GPT Pro do it much faster while GLM 5.2 can do it slowly. This is the physical evidence of the thesis: two weeks of nonstop token spend bought a 7x hardware-level speedup.
https://x.com/xyster/status/2067305659675377800
The purest 100X-tokens data point in the batch. He used GPT-5.5 in a 24/7 auto-research loop to take 4x Intel B70s running Minimax m2.7 from 14 tok/s to over 100 tok/s decode rate—a 7x gain—and he's explicit about the cost: 'It took two weeks of 24/7 auto research. That's a lot of tokens!!' He even benchmarks the loop itself, noting Fable and GPT Pro do it much faster while GLM 5.2 can do it slowly. This is the physical evidence of the thesis: two weeks of nonstop token spend bought a 7x hardware-level speedup.
#11
@dunik_7
https://x.com/dunik_7/status/2067173387667980496
The sharpest 'loop vs harness' argument of the day. Everyone's talking about loops; almost nobody's talking about what the loop runs on. Nine of ten builders run Claude Code on the default harness—no rules, no subagents, no hooks, no memory—then wonder why their loop produces slop, because a loop on a bad harness just makes garbage faster. He breaks the harness into four things and names the concrete pieces that make a loop compound: a reviewer subagent with a fresh context window, deterministic hooks that block dangerous calls, and a state file the agent reads at start and writes at end so the next run resumes instead of restarting.
https://x.com/dunik_7/status/2067173387667980496
The sharpest 'loop vs harness' argument of the day. Everyone's talking about loops; almost nobody's talking about what the loop runs on. Nine of ten builders run Claude Code on the default harness—no rules, no subagents, no hooks, no memory—then wonder why their loop produces slop, because a loop on a bad harness just makes garbage faster. He breaks the harness into four things and names the concrete pieces that make a loop compound: a reviewer subagent with a fresh context window, deterministic hooks that block dangerous calls, and a state file the agent reads at start and writes at end so the next run resumes instead of restarting.
#12
@HarryTandy
https://x.com/HarryTandy/status/2067243818189996279
A concrete 8-step production-agent loop recipe, framed by Jensen Huang's line that you program an AI like you program a person—and a person with 40 manuals open gets slower, just as an agent with 40 tool schemas starts making weird calls. The recipe: a job card with a done condition, working-memory files (scratchpad, decisions, open_questions), an input filter and a tool filter that surfaces only 3-5 tools, output receipts turning each result into source/finding/decision/next-action, phase resets across research/plan/build, and a verifier pass. It's a real template for keeping a long-running agent loop coherent past step 15.
https://x.com/HarryTandy/status/2067243818189996279
A concrete 8-step production-agent loop recipe, framed by Jensen Huang's line that you program an AI like you program a person—and a person with 40 manuals open gets slower, just as an agent with 40 tool schemas starts making weird calls. The recipe: a job card with a done condition, working-memory files (scratchpad, decisions, open_questions), an input filter and a tool filter that surfaces only 3-5 tools, output receipts turning each result into source/finding/decision/next-action, phase resets across research/plan/build, and a verifier pass. It's a real template for keeping a long-running agent loop coherent past step 15.
#13
@7h3h4ckv157
https://x.com/7h3h4ckv157/status/2067218182470004891
A clean description of Nous Research's Hermes and its built-in learning loop—what makes a 'self-improving' agent more than a slogan. It creates skills from experience, refines them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. The deployment story matters too: run it on a $5 VPS, a GPU cluster, or near-free idle serverless, not tied to your laptop—talk to it from Telegram while it works on a cloud VM. The learning loop plus always-on detachment is the whole pitch.
https://x.com/7h3h4ckv157/status/2067218182470004891
A clean description of Nous Research's Hermes and its built-in learning loop—what makes a 'self-improving' agent more than a slogan. It creates skills from experience, refines them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. The deployment story matters too: run it on a $5 VPS, a GPU cluster, or near-free idle serverless, not tied to your laptop—talk to it from Telegram while it works on a cloud VM. The learning loop plus always-on detachment is the whole pitch.
#14
@ShinkaIoT
https://x.com/ShinkaIoT/status/2067074110522536298
A methodology writeup on 'Loop Engineering' built on Hermes plus Claude Code, with a useful distinction. Deterministic loops are for tasks where 'done' is absolute—fixing bugs, compiling, deploying—so you run test scripts until 100% pass, then auto-commit via the GitHub CLI. Non-deterministic loops are for UI and judgment tasks, run as a builder-verifier adversarial setup with an 'AI Slop Detector.' He lays out a concrete five-phase loop architecture and frames the industry as shifting from prompt engineering to loop engineering, with an always-on agent like Hermes automating the whole development cycle.
https://x.com/ShinkaIoT/status/2067074110522536298
A methodology writeup on 'Loop Engineering' built on Hermes plus Claude Code, with a useful distinction. Deterministic loops are for tasks where 'done' is absolute—fixing bugs, compiling, deploying—so you run test scripts until 100% pass, then auto-commit via the GitHub CLI. Non-deterministic loops are for UI and judgment tasks, run as a builder-verifier adversarial setup with an 'AI Slop Detector.' He lays out a concrete five-phase loop architecture and frames the industry as shifting from prompt engineering to loop engineering, with an always-on agent like Hermes automating the whole development cycle.
#15
@RileyRalmuto
https://x.com/RileyRalmuto/status/2067082186096796135
A glimpse of orchestration-as-autoresearch on the desktop. Polyphonic for Mac now lets you say, in plain language, 'fan out 6 specialized research agents to deep-dive recursive self-improving architectures, loops and harnesses, have each write reports on their insights, then generate an HTML page with your synthesis.' You watch it build the agents, deploy them, follow each one's live activity timeline, and get a combined synthesis with an action plan and next steps rendered in-canvas. It's the multi-agent research fan-out pattern packaged as a consumer-grade Mac app.
https://x.com/RileyRalmuto/status/2067082186096796135
A glimpse of orchestration-as-autoresearch on the desktop. Polyphonic for Mac now lets you say, in plain language, 'fan out 6 specialized research agents to deep-dive recursive self-improving architectures, loops and harnesses, have each write reports on their insights, then generate an HTML page with your synthesis.' You watch it build the agents, deploy them, follow each one's live activity timeline, and get a combined synthesis with an action plan and next steps rendered in-canvas. It's the multi-agent research fan-out pattern packaged as a consumer-grade Mac app.
#16
@luckeyfaraday
https://x.com/luckeyfaraday/status/2067360145592516798
A small but honest A/B on the loop itself. He turned the agent-loop concept into a proper open-source repo and tested it by building an FPS game with MiMo-V2.5 using an orchestrator → worker → reviewer loop. The version running the loop performed significantly better than plain MiMo. It's exactly the kind of controlled comparison the space needs more of—same base model, with and without the loop scaffolding—and the repo is public so others can check the claim.
https://x.com/luckeyfaraday/status/2067360145592516798
A small but honest A/B on the loop itself. He turned the agent-loop concept into a proper open-source repo and tested it by building an FPS game with MiMo-V2.5 using an orchestrator → worker → reviewer loop. The version running the loop performed significantly better than plain MiMo. It's exactly the kind of controlled comparison the space needs more of—same base model, with and without the loop scaffolding—and the repo is public so others can check the claim.
#17
@Vemaster
https://x.com/Vemaster/status/2067185170520612907
A concrete, self-built agentic loop for a real production. He built a multi-agent loop—Research, Plan, Act, Verify—inside Cursor to make it an expert in their UE5 project, and is now exploring a TDD-automated harness-router for gamedev on Unreal Engine with deep Editor integration. It's a good example of the loop pattern being adapted to a specific, gnarly domain (game development on a heavy engine) rather than the usual web-app demos, with verification built into the cycle from the start.
https://x.com/Vemaster/status/2067185170520612907
A concrete, self-built agentic loop for a real production. He built a multi-agent loop—Research, Plan, Act, Verify—inside Cursor to make it an expert in their UE5 project, and is now exploring a TDD-automated harness-router for gamedev on Unreal Engine with deep Editor integration. It's a good example of the loop pattern being adapted to a specific, gnarly domain (game development on a heavy engine) rather than the usual web-app demos, with verification built into the cycle from the start.
#18
@anshulcreates
https://x.com/anshulcreates/status/2067127815200235892
A short but striking non-coding autoresearch use case. He says he was using autoresearch to iteratively make commercial solar panels more effective—and is now energized by the physical-autoresearch direction (ENPIRE), asking how to get involved. It's a data point that the autoresearch loop is already escaping software into hard science and hardware optimization, in the hands of people who say they'd happily do this kind of research for the rest of their lives. The loop as a research instrument for the physical world.
https://x.com/anshulcreates/status/2067127815200235892
A short but striking non-coding autoresearch use case. He says he was using autoresearch to iteratively make commercial solar panels more effective—and is now energized by the physical-autoresearch direction (ENPIRE), asking how to get involved. It's a data point that the autoresearch loop is already escaping software into hard science and hardware optimization, in the hands of people who say they'd happily do this kind of research for the rest of their lives. The loop as a research instrument for the physical world.
#19
@punchtaylor
https://x.com/punchtaylor/status/2067326371144081423
Hands-on work on a self-improving agent framework, not commentary. He has five PRs open against Nous Research's hermes-agent, including an MQTT platform adapter that's observational by default—events log to file with no agent-loop-per-message, born from a feedback loop that nearly took down his mesh on first flip—and a 'hermes mesh' fleet-provisioner CLI codifying the 9-node pattern he's been running, plus a streaming-content skill that transcribes video into structured JSON. This is what it looks like when someone is actually operating a fleet of self-improving agents and contributing the rough edges back upstream.
https://x.com/punchtaylor/status/2067326371144081423
Hands-on work on a self-improving agent framework, not commentary. He has five PRs open against Nous Research's hermes-agent, including an MQTT platform adapter that's observational by default—events log to file with no agent-loop-per-message, born from a feedback loop that nearly took down his mesh on first flip—and a 'hermes mesh' fleet-provisioner CLI codifying the 9-node pattern he's been running, plus a streaming-content skill that transcribes video into structured JSON. This is what it looks like when someone is actually operating a fleet of self-improving agents and contributing the rough edges back upstream.
#20
@enesakar
https://x.com/enesakar/status/2067344775754260865
A concrete agent-loop build with a clean separation of concerns. They built 'Ask HackerNews,' an agent that answers questions from real HN data, with Vercel Eve handling the agent loop and Upstash Redis Search handling retrieval. It's a small, shippable example of the emerging 'framework owns the loop' pattern—you bring the data and the question, the harness manages the durable agent loop, and a dedicated search layer does the lookups. The kind of build that shows the loop frameworks are getting real enough to ship products on.
https://x.com/enesakar/status/2067344775754260865
A concrete agent-loop build with a clean separation of concerns. They built 'Ask HackerNews,' an agent that answers questions from real HN data, with Vercel Eve handling the agent loop and Upstash Redis Search handling retrieval. It's a small, shippable example of the emerging 'framework owns the loop' pattern—you bring the data and the question, the harness manages the durable agent loop, and a dedicated search layer does the lookups. The kind of build that shows the loop frameworks are getting real enough to ship products on.
#21
@zeewasd
https://x.com/zeewasd/status/2067099434475991494
A new project staking out the operating-layer slot. Rudder is an open-source operating layer for self-improving agent teams—it helps agents learn human taste, plan short- and long-term, get reviewed, and improve across runs. It's early and feedback-seeking, but the framing is notable: the interesting problems are moving up a level from a single agent's loop to coordinating teams of agents that improve over time, with human taste as the thing they're learning to match. One more entry in the rapidly filling 'operating layer for agent fleets' category.
https://x.com/zeewasd/status/2067099434475991494
A new project staking out the operating-layer slot. Rudder is an open-source operating layer for self-improving agent teams—it helps agents learn human taste, plan short- and long-term, get reviewed, and improve across runs. It's early and feedback-seeking, but the framing is notable: the interesting problems are moving up a level from a single agent's loop to coordinating teams of agents that improve over time, with human taste as the thing they're learning to match. One more entry in the rapidly filling 'operating layer for agent fleets' category.
📡 Eco Products Radar
Eco Products Radar
ENPIRE (NVIDIA) — the day's defining project: fleet agentic autoresearch for physical robots, with a two-layer safety harness and frozen-reward design.
Hermes / hermes-agent (Nous Research) — the self-improving agent with a built-in learning loop, deployed everywhere from $5 VPSes to fleets; the most-contributed-to framework in the batch.
Vercel eve — the 'Next.js for agents' harness that owns the agent loop (durable sessions, sandboxes, subagents, evals), already shipping real builds like Ask HackerNews.
Claude Code — the default harness people build their loops and Loop Engineering setups on.
GLM-5.2 / Qwen / MiMo — the open-weight models people point their autoresearch and loop experiments at.
Karpathy autoresearch + parameter-golf — the iterative self-improvement loop people are running small models through.
ENPIRE (NVIDIA) — the day's defining project: fleet agentic autoresearch for physical robots, with a two-layer safety harness and frozen-reward design.
Hermes / hermes-agent (Nous Research) — the self-improving agent with a built-in learning loop, deployed everywhere from $5 VPSes to fleets; the most-contributed-to framework in the batch.
Vercel eve — the 'Next.js for agents' harness that owns the agent loop (durable sessions, sandboxes, subagents, evals), already shipping real builds like Ask HackerNews.
Claude Code — the default harness people build their loops and Loop Engineering setups on.
GLM-5.2 / Qwen / MiMo — the open-weight models people point their autoresearch and loop experiments at.
Karpathy autoresearch + parameter-golf — the iterative self-improvement loop people are running small models through.
Comments