Loop Daily: July 2, 2026
The story this cycle isn't "agents can loop," it's that the loop got cheap enough to leave running. Sonnet 5's pricing collapse turned overnight trading bots and always-on research agents from a money pit into a no-brainer, and the results people are posting show it: a paper replicated in four prompts, a trading loop that made half a million, a weather forecaster that beats humans by correcting itself every run. Underneath the hype thread farming, there's a real methodology forming: agent loop plus a verifier plus memory plus a schedule. And a healthy skeptic camp reminding everyone that a fast inner loop pointed at a weak verifier is just an expensive way to burn tokens.
#1
@askalphaxiv
https://x.com/askalphaxiv/status/2072097922595029357
Used their AutoResearch tool to actually replicate a paper (DiffusionBlocks) end to end and found it was training-budget sensitive: at 25% of the required epochs it hadn't caught the baseline, but pushing layers from 12 to 24 (which the paper never explored) let it beat the baseline at just 25% of the epochs. The whole set of experiments was driven by four plain-English prompts. This is the clearest example of autoresearch doing genuine science, not a demo.
https://x.com/askalphaxiv/status/2072097922595029357
Used their AutoResearch tool to actually replicate a paper (DiffusionBlocks) end to end and found it was training-budget sensitive: at 25% of the required epochs it hadn't caught the baseline, but pushing layers from 12 to 24 (which the paper never explored) let it beat the baseline at just 25% of the epochs. The whole set of experiments was driven by four plain-English prompts. This is the clearest example of autoresearch doing genuine science, not a demo.
#2
@antpalkin
https://x.com/antpalkin/status/2072083069339943277
A broke 23-year-old built a trading bot as an agentic loop in Lisbon, killed it after a month because the API bills ate everything, then turned it back on when Sonnet 5 dropped the cost from about $4,000/day to under $4. Same bot, a tiny fraction of the cost, and it reportedly made $500,000. His point is sharp: the loop was always viable, the model just made running it affordable, and a loop that runs forever keeps what works and rewrites its own rules every night.
https://x.com/antpalkin/status/2072083069339943277
A broke 23-year-old built a trading bot as an agentic loop in Lisbon, killed it after a month because the API bills ate everything, then turned it back on when Sonnet 5 dropped the cost from about $4,000/day to under $4. Same bot, a tiny fraction of the cost, and it reportedly made $500,000. His point is sharp: the loop was always viable, the model just made running it affordable, and a loop that runs forever keeps what works and rewrites its own rules every night.
#3
@heckmeier
https://x.com/heckmeier/status/2072001751947862074
Wired an agentic loop to the real physical world: live weather plus webcam data plus 11 years of historical measurements, aimed at forecasting one specific local phenomenon (the Maloja wind). The goal is a self-improving forecaster that beats human forecasters and discovers previously undocumented dependencies on its own, with reality correcting it on every run. A beautiful example of closing the loop against the physical world instead of a benchmark.
https://x.com/heckmeier/status/2072001751947862074
Wired an agentic loop to the real physical world: live weather plus webcam data plus 11 years of historical measurements, aimed at forecasting one specific local phenomenon (the Maloja wind). The goal is a self-improving forecaster that beats human forecasters and discovers previously undocumented dependencies on its own, with reality correcting it on every run. A beautiful example of closing the loop against the physical world instead of a benchmark.
#4
@elliot_c_smith
https://x.com/elliot_c_smith/status/2072086311784333573
Ran an experiment throwing Claude Code at an optimization problem with no well-defined gradient, explicitly inspired by Karpathy's Autoresearch but adapted to the reality that most real code has noisy signals. It's the honest version of the pattern: not a clean loss curve, but an agent iterating against a messy, real-world objective. Worth watching as autoresearch moves from toy problems to noisy ones.
https://x.com/elliot_c_smith/status/2072086311784333573
Ran an experiment throwing Claude Code at an optimization problem with no well-defined gradient, explicitly inspired by Karpathy's Autoresearch but adapted to the reality that most real code has noisy signals. It's the honest version of the pattern: not a clean loss curve, but an agent iterating against a messy, real-world objective. Worth watching as autoresearch moves from toy problems to noisy ones.
#5
@iScienceLuvr
https://x.com/iScienceLuvr/status/2071886472844894381
Announced they're building Labless, a new platform for autoresearch and agentic research, and are openly looking for feedback and partners. The interesting signal isn't the pitch, it's that "autoresearch platform" is now a category worth naming a startup after. The tooling layer under Karpathy's pattern is starting to get built for real.
https://x.com/iScienceLuvr/status/2071886472844894381
Announced they're building Labless, a new platform for autoresearch and agentic research, and are openly looking for feedback and partners. The interesting signal isn't the pitch, it's that "autoresearch platform" is now a category worth naming a startup after. The tooling layer under Karpathy's pattern is starting to get built for real.
#6
@0xProbabillity
https://x.com/0xProbabillity/status/2071965889298133357
A detailed engineering breakdown of Spotify's agentic workflow, and the lesson is that it isn't about prompts. They got a 75% jump in PR frequency by investing in fleet-management infrastructure and standardization, built a Linux/macOS verification loop where agents on the Claude Agent SDK write code, trigger the build, fix their own errors and loop until it passes, then strengthened test automation enough to auto-merge without a human in the loop. Non-engineers now ship prototypes to an internal App Store in under two hours.
https://x.com/0xProbabillity/status/2071965889298133357
A detailed engineering breakdown of Spotify's agentic workflow, and the lesson is that it isn't about prompts. They got a 75% jump in PR frequency by investing in fleet-management infrastructure and standardization, built a Linux/macOS verification loop where agents on the Claude Agent SDK write code, trigger the build, fix their own errors and loop until it passes, then strengthened test automation enough to auto-merge without a human in the loop. Non-engineers now ship prototypes to an internal App Store in under two hours.
#7
@jordiponsdotme
https://x.com/jordiponsdotme/status/2071938261258326079
Shares the exact agentic loop he uses for writing: start with a draft, have one agent review it as a top ML-conference reviewer, have a second agent rewrite the draft as a scientific writer, then repeat the review-rewrite cycle until the reviewer is satisfied. It's a clean, reproducible two-agent loop with a built-in quality bar. The verifier is another agent playing a harsh critic.
https://x.com/jordiponsdotme/status/2071938261258326079
Shares the exact agentic loop he uses for writing: start with a draft, have one agent review it as a top ML-conference reviewer, have a second agent rewrite the draft as a scientific writer, then repeat the review-rewrite cycle until the reviewer is satisfied. It's a clean, reproducible two-agent loop with a built-in quality bar. The verifier is another agent playing a harsh critic.
#8
@0xCodez
https://x.com/0xCodez/status/2071996078568701978
Breaks down how to build memory for self-improving agents: procedural (skills / how to act), semantic (durable facts and profile), and episodic (dated events and chat history). His framing is that memory is a core part of any agentic loop, and memory plus loops plus a harness plus evals is what turns a one-shot bot into a self-improving system. A concrete mental model for the piece most people skip.
https://x.com/0xCodez/status/2071996078568701978
Breaks down how to build memory for self-improving agents: procedural (skills / how to act), semantic (durable facts and profile), and episodic (dated events and chat history). His framing is that memory is a core part of any agentic loop, and memory plus loops plus a harness plus evals is what turns a one-shot bot into a self-improving system. A concrete mental model for the piece most people skip.
#9
@cshekhar
https://x.com/cshekhar/status/2071878507483848748
Runs an internal agent deployment platform with tenant isolation via microVMs and Kernel 7 isolation, so a leaky free-tier app can't blast-radius the rest, and the agentic loop powering it is open source with human-in-the-loop approvals baked in for anything risky. This is the grown-up version of "let the agent loop": real isolation and real approval gates. The infrastructure conversation is catching up to the capability.
https://x.com/cshekhar/status/2071878507483848748
Runs an internal agent deployment platform with tenant isolation via microVMs and Kernel 7 isolation, so a leaky free-tier app can't blast-radius the rest, and the agentic loop powering it is open source with human-in-the-loop approvals baked in for anything risky. This is the grown-up version of "let the agent loop": real isolation and real approval gates. The infrastructure conversation is catching up to the capability.
#10
@GAXEN10
https://x.com/GAXEN10/status/2071980076401668365
Describes replacing a four-agent morning routine (research, writing, review, publishing, copy-pasted between each) with Claude Code Dynamic Workflows. One command spins up a dashboard showing every phase running in parallel, tokens consumed, and which subagent is executing or validating, with the whole loop written by Claude as a JavaScript file in the .claude/workflows folder. Three hours of babysitting became four minutes of setup, and he now runs six workflow loops a day.
https://x.com/GAXEN10/status/2071980076401668365
Describes replacing a four-agent morning routine (research, writing, review, publishing, copy-pasted between each) with Claude Code Dynamic Workflows. One command spins up a dashboard showing every phase running in parallel, tokens consumed, and which subagent is executing or validating, with the whole loop written by Claude as a JavaScript file in the .claude/workflows folder. Three hours of babysitting became four minutes of setup, and he now runs six workflow loops a day.
#11
@hedgineering
https://x.com/hedgineering/status/2071951901491376441
A podcast episode breaking down what an agentic loop actually is for investment teams, and what it looks like when they put it to use, from earnings recap to idea generation. The framing is that loops shift analysts from reactive prompting to autonomous pipelines that raise idea velocity. Concrete evidence the loop pattern is landing in finance workflows, not just coding.
https://x.com/hedgineering/status/2071951901491376441
A podcast episode breaking down what an agentic loop actually is for investment teams, and what it looks like when they put it to use, from earnings recap to idea generation. The framing is that loops shift analysts from reactive prompting to autonomous pipelines that raise idea velocity. Concrete evidence the loop pattern is landing in finance workflows, not just coding.
#12
@thenightshipper
https://x.com/thenightshipper/status/2071997788347642005
Makes the underrated point that the agentic loop got roughly 100x faster this year while the external feedback loop (real users, real markets) didn't, so "what to build" is now the bottleneck, not "can I build it." Most engineers are sharp on the inner loop and weak on the outer one, and that gap is the real skill now. A clear-eyed take on where autonomy actually helps and where it doesn't.
https://x.com/thenightshipper/status/2071997788347642005
Makes the underrated point that the agentic loop got roughly 100x faster this year while the external feedback loop (real users, real markets) didn't, so "what to build" is now the bottleneck, not "can I build it." Most engineers are sharp on the inner loop and weak on the outer one, and that gap is the real skill now. A clear-eyed take on where autonomy actually helps and where it doesn't.
#13
@kingofknowwhere
https://x.com/kingofknowwhere/status/2071928945054994561
Built an MVP (fully vibe-coded) of a website that updates itself from Telegram based on every user interaction: every form submission goes to a developer agent and becomes a Jira ticket the agent handles, all autonomously. Every visit is a chance for the site to self-improve. It's a scrappy but real closed loop where user behavior directly feeds the build queue.
https://x.com/kingofknowwhere/status/2071928945054994561
Built an MVP (fully vibe-coded) of a website that updates itself from Telegram based on every user interaction: every form submission goes to a developer agent and becomes a Jira ticket the agent handles, all autonomously. Every visit is a chance for the site to self-improve. It's a scrappy but real closed loop where user behavior directly feeds the build queue.
#14
@valhalla_dev
https://x.com/valhalla_dev/status/2072004864339505397
Built LAIN, the Loki Agent Intelligence Network, for a hackathon: agents run self-improvement loops to become genius subject-matter experts in one domain, then sell research to humans and autonomously buy research from other agents using Stripe's Machine Payment Protocol. In the demo, one agent quotes a report, takes a real Stripe payment, then pays a second specialist agent for a sub-report on NVIDIA economics before merging both. A genuine glimpse of a self-improving agent marketplace with real money moving.
https://x.com/valhalla_dev/status/2072004864339505397
Built LAIN, the Loki Agent Intelligence Network, for a hackathon: agents run self-improvement loops to become genius subject-matter experts in one domain, then sell research to humans and autonomously buy research from other agents using Stripe's Machine Payment Protocol. In the demo, one agent quotes a report, takes a real Stripe payment, then pays a second specialist agent for a sub-report on NVIDIA economics before merging both. A genuine glimpse of a self-improving agent marketplace with real money moving.
#15
@arcprize
https://x.com/arcprize/status/2072069184146833674
Highlights Continual Harness, an efficient self-improving agent on ARC-AGI-3 from Prime Intellect, where the benchmark's heavy test-time learning forces the agent to build an internal world model of the rules and mechanics that updates as new evidence arrives. It's self-improvement as a requirement of the task, not a marketing label. One of the more rigorous entries in the self-improving-agent space this cycle.
https://x.com/arcprize/status/2072069184146833674
Highlights Continual Harness, an efficient self-improving agent on ARC-AGI-3 from Prime Intellect, where the benchmark's heavy test-time learning forces the agent to build an internal world model of the rules and mechanics that updates as new evidence arrives. It's self-improvement as a requirement of the task, not a marketing label. One of the more rigorous entries in the self-improving-agent space this cycle.
#16
@TeksCreate
https://x.com/TeksCreate/status/2071782195962806289
A detailed look at Hermes Agent, the self-improving agent that learns from every session and just passed 205K stars. Its closed learning loop creates skills autonomously after complex tasks, self-improves those skills during use, persists knowledge, and searches its own past conversations, all built on a trajectory-generation into compression into training-feedback pipeline. The claim worth tracking is agents that measurably get better at tool-calling over time rather than staying static prompt-engineered bots.
https://x.com/TeksCreate/status/2071782195962806289
A detailed look at Hermes Agent, the self-improving agent that learns from every session and just passed 205K stars. Its closed learning loop creates skills autonomously after complex tasks, self-improves those skills during use, persists knowledge, and searches its own past conversations, all built on a trajectory-generation into compression into training-feedback pipeline. The claim worth tracking is agents that measurably get better at tool-calling over time rather than staying static prompt-engineered bots.
#17
@bsormagec
https://x.com/bsormagec/status/2072001373504151568
A sober analysis of Ornith-1.0, an open-weight coding model trained with a self-improving RL framework that optimizes both the solution rollouts and the scaffolds that guide them. His key caveat cuts through the branding: the "self-improving" here is a training trick (RL-generated harnesses), not runtime evolution, so buy the open-weights-plus-tooling story, not the marketing. A useful reminder to read "self-improving" claims carefully.
https://x.com/bsormagec/status/2072001373504151568
A sober analysis of Ornith-1.0, an open-weight coding model trained with a self-improving RL framework that optimizes both the solution rollouts and the scaffolds that guide them. His key caveat cuts through the branding: the "self-improving" here is a training trick (RL-generated harnesses), not runtime evolution, so buy the open-weights-plus-tooling story, not the marketing. A useful reminder to read "self-improving" claims carefully.
#18
@AndrewK404
https://x.com/AndrewK404/status/2072034692790927570
After a few days studying Claude Code, Codex, OpenHands, Hermes, and LangGraph, he's convinced the whole industry has converged on one Agent Runtime architecture: gather history, send to the model, and if there are tool calls run the tool and go back to step one, otherwise return the answer. That's the entire loop. A clean, deflationary description of what an "agent" actually is under the hood.
https://x.com/AndrewK404/status/2072034692790927570
After a few days studying Claude Code, Codex, OpenHands, Hermes, and LangGraph, he's convinced the whole industry has converged on one Agent Runtime architecture: gather history, send to the model, and if there are tool calls run the tool and go back to step one, otherwise return the answer. That's the entire loop. A clean, deflationary description of what an "agent" actually is under the hood.
#19
@MaziyarPanahi
https://x.com/MaziyarPanahi/status/2071955191260151862
Runs GLM-5.2 locally on a Mac Studio via llama.cpp and gave it a browser, building an agent loop around browser-use. He asked it to find a PII model and it searched Hugging Face and surfaced privacy-filter-nemotron on its own, one open model finding another. His line: AI must be owned, not rented. A concrete, fully-local agent loop with real tool use and no API dependency.
https://x.com/MaziyarPanahi/status/2071955191260151862
Runs GLM-5.2 locally on a Mac Studio via llama.cpp and gave it a browser, building an agent loop around browser-use. He asked it to find a PII model and it searched Hugging Face and surfaced privacy-filter-nemotron on its own, one open model finding another. His line: AI must be owned, not rented. A concrete, fully-local agent loop with real tool use and no API dependency.
#20
@IhorSkiba
https://x.com/IhorSkiba/status/2071962711488184690
Reports 53 hours, 1,520 jobs done, zero prompts typed, and lays out four rungs of the loop most people never climb past the first of: the agent loop (model calls tools until the goal is met), the verification loop (a grader scores every output against a rubric before it ships), the event-driven loop (a cron or webhook fires it), and the hill-climbing loop (the agent reads its own traces and rewrites its own prompt nightly). His point: you only get the compounding gains once you're not the one typing.
https://x.com/IhorSkiba/status/2071962711488184690
Reports 53 hours, 1,520 jobs done, zero prompts typed, and lays out four rungs of the loop most people never climb past the first of: the agent loop (model calls tools until the goal is met), the verification loop (a grader scores every output against a rubric before it ships), the event-driven loop (a cron or webhook fires it), and the hill-climbing loop (the agent reads its own traces and rewrites its own prompt nightly). His point: you only get the compounding gains once you're not the one typing.
#21
@dipankarsarkar
https://x.com/dipankarsarkar/status/2071991327156220400
A great concrete debugging story: he profiled an agent loop expecting the model to be the slow part, and it was deepcopy on the state object every iteration. Swapping the serialization path made it about 30x faster without ever touching the agent. The reminder is that a lot of agent latency lives above the silicon, in your own plumbing, not the LLM.
https://x.com/dipankarsarkar/status/2071991327156220400
A great concrete debugging story: he profiled an agent loop expecting the model to be the slow part, and it was deepcopy on the state object every iteration. Swapping the serialization path made it about 30x faster without ever touching the agent. The reminder is that a lot of agent latency lives above the silicon, in your own plumbing, not the LLM.
#22
@johniosifov
https://x.com/johniosifov/status/2072002479525380409
Uses Cognition's Devin reporting 89% of its own codebase now written by its agent as a jumping-off point, then shares his own experience running 109 content bursts (sessions, PRs, posts, research, strategy) all driven by an agent loop with no human writing the content. His conclusion is that the limiting factor isn't the AI's capability but the human's willingness to define clear goals and let the agent own execution. Speed of iteration without human friction is the real unlock.
https://x.com/johniosifov/status/2072002479525380409
Uses Cognition's Devin reporting 89% of its own codebase now written by its agent as a jumping-off point, then shares his own experience running 109 content bursts (sessions, PRs, posts, research, strategy) all driven by an agent loop with no human writing the content. His conclusion is that the limiting factor isn't the AI's capability but the human's willingness to define clear goals and let the agent own execution. Speed of iteration without human friction is the real unlock.
#23
@jerryjliu0
https://x.com/jerryjliu0/status/2072035931050426782
Frames document parsing as something that has to live inside the agent loop: when a user drops 1,000 PDFs into an agent, you need an extremely fast pass to make sense of the docs before a deeper dive, which is why they built LiteParse as an OSS project designed to run in the agent loop and route to deeper VLM-enabled modes when needed. It's a reminder that real agent loops need fast, cheap perception steps, not just a big model. Parsing is becoming loop infrastructure.
https://x.com/jerryjliu0/status/2072035931050426782
Frames document parsing as something that has to live inside the agent loop: when a user drops 1,000 PDFs into an agent, you need an extremely fast pass to make sense of the docs before a deeper dive, which is why they built LiteParse as an OSS project designed to run in the agent loop and route to deeper VLM-enabled modes when needed. It's a reminder that real agent loops need fast, cheap perception steps, not just a big model. Parsing is becoming loop infrastructure.
#24
@HolmesosaurusRx
https://x.com/HolmesosaurusRx/status/2071812299309133946
The sharpest skeptic of the day: loops work for simple objective tasks (run lint, fix the obvious issue, rerun once, stop) but get dangerous on complex work like research synthesis, product judgment, and pricing, where the verifier is weak and the model creates work and burns tokens. His alternative is human-led orchestration: human sets the bar, agent does a bounded pass, agent verifies against explicit criteria, human decides the next move. "Autonomy without orchestration is just an expensive intern with infinite stamina and your credit card."
https://x.com/HolmesosaurusRx/status/2071812299309133946
The sharpest skeptic of the day: loops work for simple objective tasks (run lint, fix the obvious issue, rerun once, stop) but get dangerous on complex work like research synthesis, product judgment, and pricing, where the verifier is weak and the model creates work and burns tokens. His alternative is human-led orchestration: human sets the bar, agent does a bounded pass, agent verifies against explicit criteria, human decides the next move. "Autonomy without orchestration is just an expensive intern with infinite stamina and your credit card."
π‘ Eco Products Radar
Eco Products Radar
Claude Code - the default host for agentic loops and Dynamic Workflows, from Spotify's verification loop to one-command overnight workflow farms (@0xProbabillity, @GAXEN10, @elliot_c_smith).
Hermes Agent - the reference self-improving agent this cycle, with a closed skill-learning loop cited as the architecture to copy (@TeksCreate, @AndrewK404, @valhalla_dev).
Sonnet 5 - the pricing shift that made always-on loops economically sane, repeatedly credited as the reason dead bots got switched back on (@antpalkin).
autoresearch (Karpathy) - the pattern everyone is forking and building platforms around, from paper replication to noisy real-world objectives (@askalphaxiv, @elliot_c_smith, @iScienceLuvr).
Claude Code - the default host for agentic loops and Dynamic Workflows, from Spotify's verification loop to one-command overnight workflow farms (@0xProbabillity, @GAXEN10, @elliot_c_smith).
Hermes Agent - the reference self-improving agent this cycle, with a closed skill-learning loop cited as the architecture to copy (@TeksCreate, @AndrewK404, @valhalla_dev).
Sonnet 5 - the pricing shift that made always-on loops economically sane, repeatedly credited as the reason dead bots got switched back on (@antpalkin).
autoresearch (Karpathy) - the pattern everyone is forking and building platforms around, from paper replication to noisy real-world objectives (@askalphaxiv, @elliot_c_smith, @iScienceLuvr).
Comments