Loop Daily: 2026-05-14
A quiet day on the keyword surface, but the cases that landed share one shape: agents that actually run on their own for hours and produce something measurable. A coding agent climbed from 12.7% to 86.1% on ImageNet over 200+ self-edited iterations. A locally-hosted 121B model wrote and passed an entire test suite without anyone touching a keyboard. A trading agent grew a portfolio 10% over six days across 21 prediction markets and 175 trades. The frame is the same in all three: human sets the goal, agent burns the cycles, results are real.
#1
@learningPikachu
https://x.com/learningPikachu/status/2054031405038920000
Built a high-level loop where a coding agent ran 200+ eval iterations on a 10-class ImageNet subset over twenty hours at roughly $200 in API cost. Loop shape: run eval, analyze confusion matrix, edit symbolic features, test for regression, repeat. Climbed from 12.7% to 86.1% top-1 with zero neural networks, no gradients, no learned weights β just symbolic features and scoring rules the agent edited into a Python program. The 86% ceiling is representation saturation at 64x64 resolution, not algorithmic limit. Probably the cleanest single-task autoresearch demo of the month: an explicit measurable loss, an agent that can edit the program, no human in the loop after kickoff.
https://x.com/learningPikachu/status/2054031405038920000
Built a high-level loop where a coding agent ran 200+ eval iterations on a 10-class ImageNet subset over twenty hours at roughly $200 in API cost. Loop shape: run eval, analyze confusion matrix, edit symbolic features, test for regression, repeat. Climbed from 12.7% to 86.1% top-1 with zero neural networks, no gradients, no learned weights β just symbolic features and scoring rules the agent edited into a Python program. The 86% ceiling is representation saturation at 64x64 resolution, not algorithmic limit. Probably the cleanest single-task autoresearch demo of the month: an explicit measurable loss, an agent that can edit the program, no human in the loop after kickoff.
#2
@sudoingX
https://x.com/sudoingX/status/2054200317672366467
Asked Hermes agent from his phone to update his code on a DGX Spark β agent came back with 8 tests passing across 3 test suites, all green, all done autonomously on a 121B model running locally. Did not write a single test himself. The thing that matters here isn't the spec, it's that local-only inference is now strong enough to close a non-trivial agent loop on commodity workstation hardware. Cloud-free agentic dev work just got real.
https://x.com/sudoingX/status/2054200317672366467
Asked Hermes agent from his phone to update his code on a DGX Spark β agent came back with 8 tests passing across 3 test suites, all green, all done autonomously on a 121B model running locally. Did not write a single test himself. The thing that matters here isn't the spec, it's that local-only inference is now strong enough to close a non-trivial agent loop on commodity workstation hardware. Cloud-free agentic dev work just got real.
#3
@tonyGewrit
https://x.com/tonyGewrit/status/2054289716296446286
Agent grew his portfolio 10% in 6 days across 21 predictions on Jupiter Predict and 175 spot trades. Tweaked prompts on how his spawnagents should behave specifically on prediction markets, and that prompt change alone added +0.98 SOL. Concrete data point that autonomous trading agents are no longer just demos that lose money β at least one operator is running a live loop, iterating prompts based on observed PnL, and reporting positive results across both prediction and spot markets.
https://x.com/tonyGewrit/status/2054289716296446286
Agent grew his portfolio 10% in 6 days across 21 predictions on Jupiter Predict and 175 spot trades. Tweaked prompts on how his spawnagents should behave specifically on prediction markets, and that prompt change alone added +0.98 SOL. Concrete data point that autonomous trading agents are no longer just demos that lose money β at least one operator is running a live loop, iterating prompts based on observed PnL, and reporting positive results across both prediction and spot markets.
#4
@Osasu_io
https://x.com/Osasu_io/status/2054248522329751765
Built a local AI agent to audit his home security lab. Wrote up everything that broke, how he fixed each issue, and what he learned. The article itself is the artifact: non-Claude, non-cloud, local agent loop targeting security posture review of a homelab. Cybersecurity is one of the higher-value verticals where agentic loops can plausibly replace manual scanning β this is one of the more grounded write-ups of the practice instead of yet another "agents will revolutionize security" essay.
https://x.com/Osasu_io/status/2054248522329751765
Built a local AI agent to audit his home security lab. Wrote up everything that broke, how he fixed each issue, and what he learned. The article itself is the artifact: non-Claude, non-cloud, local agent loop targeting security posture review of a homelab. Cybersecurity is one of the higher-value verticals where agentic loops can plausibly replace manual scanning β this is one of the more grounded write-ups of the practice instead of yet another "agents will revolutionize security" essay.
π‘ Eco Products Radar
Eco Products Radar
No tool crossed the 3-mention threshold today, but two clusters are worth noting. Local-inference loops: Hermes agent (Nous Research) and DGX Spark both showed up running 121B-class models locally for real agent work. Trading loops: Jupiter Predict (jup_predict) and spawnagents are the platforms tonyGewrit is using to run continuous prediction-market and spot-trading loops. Worth watching whether the local-inference cluster grows next week as on-device agent costs keep dropping.
No tool crossed the 3-mention threshold today, but two clusters are worth noting. Local-inference loops: Hermes agent (Nous Research) and DGX Spark both showed up running 121B-class models locally for real agent work. Trading loops: Jupiter Predict (jup_predict) and spawnagents are the platforms tonyGewrit is using to run continuous prediction-market and spot-trading loops. Worth watching whether the local-inference cluster grows next week as on-device agent costs keep dropping.
Comments