June 28, 2026loop

Loop Daily: June 29, 2026

The most striking loop stories today weren't about coding agents at all. People ran autoresearch loops against weather models, ODE/PDE solvers, and local-LLM optimization, with experiments grinding overnight on a Mac mini whose fans you can hear during validation runs. The deeper conversation also matured: a running loop and a learning loop are not the same thing, and the smartest posts were about where experience gets stored, the harness, not the weights. Here's what's actually running.

💡#1

@ggkhzhao
https://x.com/ggkhzhao/status/2070963083778937018
A team pointed an autoresearch loop at real weather forecasting, asking whether an automatic research loop could improve an actual weather dynamical core by making physics-informed changes to it. Their honest admission is the best part: they weren't expecting much, but the early results were surprising enough to share. This is autoresearch escaping the "time to GPT-2" benchmark and touching hard scientific computing, the kind of domain where a measurable improvement is a genuine result, not a demo.

💡#2

@zhaoran_wang
https://x.com/zhaoran_wang/status/2070965528978457030
A sharp one-liner with real weight behind it: "autoresearch makes ODE/PDE great again without the neural part, no backprop, just evolution." It points at the same shift as the weather thread, autoresearch loops as an evolutionary search over scientific code, not a fine-tuning exercise. No gradient descent, no neural surrogate, just an agent proposing, running, and keeping what measurably improves. A reminder that the loop is a general optimizer, not a coding gimmick.

💡#3

@stretchcloud
https://x.com/stretchcloud/status/2070965390084030743
A clear retelling of the Karpathy workflow shift: the ratio flipped from 80% writing code to 80% delegating, and AutoResearch, the tool he built to show the principle, ran 700 experiments in two days with no human at the keyboard. The agent edited code, tried ideas, learned from failures, and dropped the "time to GPT-2" benchmark from 2.02 hours to 1.80. The framing that lands: the durable skills are spec design, diff review, and eval construction, judgment work, not keystroke work.

💡#4

@hasantoxr
https://x.com/hasantoxr/status/2070862397997535396
The autoresearch detail worth paying attention to: turn any paper into a graph. Install the CLI, hand your agent an arxiv id, and it resolves the setup, runs a minimal reproduction, and lays the result out as a graph, and when the PDF leaves something out, it asks you instead of guessing. From there you point autoresearch at any node and keep going, taking an experiment further or running the direction the authors abandoned, each run landing as a new node with managed compute behind it. A hundred experiments fanned out from one paper, no infrastructure on your end.

💡#5

@seanphan
https://x.com/seanphan/status/2071019963025072433
A concrete data point on how long these loops actually run: with /goal plus autoresearch and a verifiable output, his tasks run for hours unattended, and his longest single task clocked in around 30 hours. That's the real test of an autonomous loop, not whether it can do one step, but whether you can leave it overnight (and then some) and come back to something useful. He calls it underrated, and on this evidence he's right.

💡#6

@kavindpadi
https://x.com/kavindpadi/status/2070919668433584190
The most relatable autoresearch post of the day: regretting the Mac mini purchase because the autoresearch loop is busy finding local LLM optimizations and clearly wants a Mac Studio or a DGX Spark instead. It's a small window into autoresearch as a real, resource-hungry workload running on consumer hardware, with the fans audible during each validation pass. The loop isn't a thought experiment for this person; it's a job that's outgrowing his machine.

💡#7

@christophcsmith
https://x.com/christophcsmith/status/2070946305816367598
A genuinely novel personal application: he's trying to quantify the outcomes he actually cares about, things like "wellness," "impact," and "relationships maintained", so that he and the machine can autoresearch together how to maximize them. It's autoresearch pointed not at a benchmark or a codebase but at a life. Vague metrics are the hard part, but the instinct is the interesting one: if you can make a goal measurable, you can put a loop on it.

💡#8

@Veltrxai
https://x.com/Veltrxai/status/2070980138070900797
A real autoresearch feature shipping inside a free Claude Code plugin (Claude Obsidian): run /autoresearch on any topic and it works in rounds, round one reads 9 sources, round two links entities, round three writes pages, 12 new pages hands-free. It sits on a plain-markdown memory (hot.md loaded every session, an index, and growing wiki pages) and claims hybrid retrieval lifts accuracy 32%. This is the autoresearch loop wired directly into a personal knowledge base, the vault researching itself.

💡#9

@Gordey0072
https://x.com/Gordey0072/status/2070843990883160335
Zero humans in the loop: a Hermes orchestrator managing a full team of agents, researchers to architects to developers to testers, coordinated over a native Kanban, with a Karpathy-style auto-research loop plus DSPy for self-improvement. It barely got any impressions, but it's one of the more concrete descriptions of a self-improving multi-agent pipeline today, combining an orchestration layer, an explicit research loop, and a prompt-optimization framework into one running system.

💡#10

@_vmlops
https://x.com/_vmlops/status/2070721644767957275
Santander open-sourced its entire AI lab, 14 production-grade Apache-2.0 tools, and the one that matters here is ralph: a loop that runs an AI coding CLI with a fresh session each iteration, agentic loop engineering, shipped by a $100B bank. The rest of the drop (a synthetic fraud-graph generator, an LLM alignment scaffold, a mechanical-governance framework for high-stakes decisions) is notable, but ralph is the proof that "fresh-session loop" is now a pattern serious institutions are publishing, not just indie devs.

💡#11

@OkhayIea
https://x.com/OkhayIea/status/2070904599788273696
The single most clarifying post of the day: a running loop and a learning loop are not the same thing. Inside one task, an agent loop (act, observe, decide, repeat) just needs to succeed once; learning accumulates experience across many windows. He formalizes the deployed agent as base model plus a mutable harness, and the harness is the part you can inspect and revise on the timescale of deployment, far cheaper than weights. The loop runs on the harness, but the harness decides what the loop gets to keep.

💡#12

@phosphenq
https://x.com/phosphenq/status/2070967048193159344
A 50-page paper saying the quiet part out loud: the agent loop is broken, and everyone's building on it anyway. It retries forever, rewrites its own plan, and hides why it broke. The proposed fix is to drop the loop and run a structured graph you can see, control, and actually stop. Whether or not you buy the conclusion, it's the sharpest critique of the naive while-loop agent doing the rounds today, and the demand for inspectable, stoppable execution is the same one showing up across the production-agent crowd.

💡#13

@thisdudelikesAI
https://x.com/thisdudelikesAI/status/2070790383429394935
A concrete, buildable loop: an "agent loop" in Claude that researches, drafts, and critiques itself, running the same three steps over and over until the output is actually good, not just done. His framing is the useful contrast, most people use Claude like a vending machine (one prompt, one answer, move on) when the real unlock is building the loop instead. It's the simplest possible self-improving loop, and exactly the kind of thing a non-engineer can stand up.

💡#14

@tonysimons_
https://x.com/tonysimons_/status/2070967383284445226
A hands-on test of Hermes's new MoA 2.0 (Mixture of Agents): rather than a gamble on one brain, it's a panel of advisors, reference models think first, an aggregator synthesizes, and the whole thing works inside the normal agent loop with no glue code or custom routing. He spent a day running it through real workflows so you don't have to, and reports the default preset scores meaningfully higher than a single top model. The point is that ensemble synthesis now lives inside the loop itself, not bolted on around it.

💡#15

@stretchcloud
https://x.com/stretchcloud/status/2070774129825923501
A careful read of Hermes's self-improving loop, the part distinct from its multi-model synthesis: the agent creates skills from experience, revises them during use, and maintains a persistent user model across sessions, all local, no telemetry. He situates it against LM Studio, mem0, Ell and DSPy, and argues Hermes is trying to be the full runtime rather than one component. The honest caveat is there too, the benchmark needs external validation, but the mechanism (experience compiling into reusable skills) is the recurring theme of the week.

💡#16

@ManuAGI01
https://x.com/ManuAGI01/status/2070723161893851441
A concrete self-improving feature set in MiMoCode: /dream extracts knowledge from session traces, and /distill finds your repeated workflows and packages them into reusable skills. Pair that with cross-session memory (a permanent MEMORY.md, auto checkpoint snapshots, per-task progress logs backed by SQLite FTS5) and you get an agent that turns its own history into procedural skill rather than starting cold. This is the "learning loop" OkhayIea describes, implemented as two slash commands.

💡#17

@Praveen_G07
https://x.com/Praveen_G07/status/2070921224960856312
A clean summary of the ACE paper (Agentic Context Engineering): improve an AI agent by evolving its context, not its model weights. It runs a simple loop, generate, reflect, curate, where the agent solves a task, analyzes what worked or failed, and updates its contextual knowledge so the playbook gets richer over time instead of forgetting. It's the academic backbone under all the "skills from experience" posts, and it raises the right open question: can evolving context scale to months or years of accumulated knowledge?

💡#18

@DanielSmithDev
https://x.com/DanielSmithDev/status/2070870356605518324
A real build: bundling OpenClaw, Hermes, and Goose into ClawQL as optional agents, and adapting the official Ouroboros library into a multi-agent loop. It's an in-progress engineering post rather than a thread of claims, which is exactly why it's worth flagging, someone is actually wiring multiple harnesses together under one loop, with help noted from a collaborator. The Ouroboros name is apt for a loop that feeds its own output back in.

💡#19

@EverymansAI
https://x.com/EverymansAI/status/2070969433301073961
A useful three-way architecture map: CORAL versus Hermes Agent versus OpenClaw, sorted by abstraction layer. CORAL is the evolution/autonomy layer (agents generate attempts, reflect, consolidate skills, co-evolve), Hermes is the agent runtime (the structured loop, tools, retries, error recovery), and OpenClaw is the model-access layer (unified API, auth, routing). His key takeaway is that they're complementary, not competitors, CORAL could run on Hermes, which could call models through OpenClaw. A clarifying frame for a noisy ecosystem.

💡#20

@AnotherCodingX
https://x.com/AnotherCodingX/status/2070879375579586648
The unglamorous economics of long-running loops: LangChain's Deep Agents prompt-caching work shows that once an agent does longer work, every turn drags a lot of old context (system prompt, tool descriptions, skills, history) back through the model. Their approach sets explicit cache breakpoints where supported and structures the prompt so normal agent behavior doesn't blow up the cache, and the "cache blast radius" detail is the gem, a single early change can lose the cache for everything after it. Reported result: 49-80% lower token cost across real agent trajectories.

📡 Eco Products Radar

Eco Products Radar
Tools and frameworks mentioned 3+ times across today's loop posts.

Hermes (Nous Research) — the runtime everyone built loops on top of, from MoA 2.0 ensembles to self-improving skill creation to multi-agent orchestration.
autoresearch — the through-line of the day, used against weather models, ODE/PDE solvers, local-LLM tuning, and personal-life metrics.
Claude Code — the default substrate for hand-rolled research/draft/critique loops and the Claude Obsidian /autoresearch plugin.
DSPy — the prompt-optimization framework cited for the "self-improvement" half of multi-agent loops.
Obsidian — the markdown vault that loops research into and read context from, the persistent memory layer under the learning loop.
MoA (Mixture of Agents) — the ensemble-inside-the-loop pattern, now shipped as virtual models in Hermes.
Ralph — the fresh-session-per-iteration agentic loop pattern, now open-sourced by Santander.
LangChain Deep Agents — the batteries-included harness people pointed to for context management and prompt caching in long loops.

← Previous

Super User Daily: June 29, 2026

Ideas Radar: June 29, 2026

← Back to all articles

Loop Daily: June 29, 2026

Related Articles

Comments