July 3, 2026ToolOpen SourceAgents

Caveman: Why Use Many Token When Few Token Do Trick

The tagline alone earns the star count: why use many token when few token do trick. Caveman is a Claude Code skill that makes your agent talk like a caveman — sentence fragments, zero filler, no cheerful preamble — while keeping the technical content intact. Across the project's own 10-task benchmark, output tokens drop 65% on average, with a range of 22 to 87 percent. A separate caveman-compress command rewrites your memory files and cuts input tokens about 46% across sessions.

It's at 80.7k stars and gaining almost 900 a day right now, which is why we're covering it — the repo has been around since early 2026 and v1.9.0 shipped June 12. The attention is the news. There are four compression levels: lite, full, ultra, and wenyan. Yes, that last one makes the agent answer in classical Chinese, which happens to be one of the most token-dense human languages ever devised. It works across 30+ agents including Claude Code, Cursor, Windsurf and Copilot.

Here's the economics underneath the joke. Output tokens are the expensive ones, typically four to five times input price, and agents burn most of them on politeness and restatement — words nobody reads. Context compressors like headroom attack the input side; caveman attacks the side where the money actually goes. The cheapest optimization in the whole agent stack turns out to be telling the model to stop performing helpfulness and just say the thing.

There's a serious point here for anyone running agent fleets: verbosity was a UX choice for chatbots, and we're all still paying for it in a world where most model output is read by another program or skimmed by a busy human. Caveman is a blunt fix. That 80,000 people installed it tells you how big the pain is.

https://github.com/JuliusBrussee/caveman
← Previous
OpenAI Ships an Official Plugin for Claude Code
Next →
Manufact Wants to Be the Vercel of MCP
← Back to all articles

Comments

Loading...
>_