Agents have a half-life

Why smart people create so much AI slop.

Jun 10, 2026

In 1858, two continents were connected by a copper wire for the first time. The transatlantic telegraph cable was the moonshot of its century. Queen Victoria sent President Buchanan a congratulatory message. It took seventeen hours to transmit.

The cable’s chief electrician was a surgeon named Wildman Whitehouse. His theory of long-distance signaling was simple: if the signal arrives weak, push harder. So he hooked up induction coils and pumped roughly two thousand volts into a cable insulated with tree sap.

Three weeks later the cable was dead. Fried. $2.5 million of Victorian money at the bottom of the Atlantic.

The man who’d argued against him the whole time was William Thomson, later Lord Kelvin. Thomson had done the math. He knew the signal decays along the cable, that’s physics, you don’t get to vote on it. You can’t out-shout attenuation. You have to engineer around it: gentle currents, a mirror galvanometer sensitive enough to read a whisper, and eventually, repeaters placed along the line that catch the dying signal and regenerate it before passing it on.

Whitehouse had voltage. Thomson had a harness.

Keep this story in mind, because the AI industry is currently run by Whitehouses, and they want to sell you the voltage.

Hallucination is radioactive

In the tokenmaxxing piece I showed you the economics: consumption is outrunning cost-deflation by four orders of magnitude, and the subscription subsidy is a melting ice cube. Today I want to show you the physics of why most of that spend turns into slop. It fits in one line:

Where p is the probability your agent gets a single step right, and n is the number of steps it chains together without anyone checking.

Rewrite it the way a physicist would:

$P(n) = e^{-\lambda n}$

That’s the equation for radioactive decay.

Which means correctness, like uranium, has a half-life: the number of autonomous steps after which your output is more likely wrong than right.

Simple tasks are pretty hard

A “simple” task like research the competitor landscape and draft a positioning doc is easily 50-100 micro-decisions: which sources to trust, what to include, which claim follows from which fact, how to phrase the thing. Run that at 99% per-step reliability, genuinely frontier-grade, and a 100-step chain completes correctly 37% of the time. At 95%, which is closer to honest for messy real-world work, it’s 0.6%.

Not 60 percent. Zero point six.

This is why your agent feels brilliant in a demo and braindead in production. The demo is 10 steps. Production is 200. The model didn’t get dumber. It decayed. You watched three half-lives go by and then acted surprised the output was radioactive.

Last week I decomposed one of the invoicing SOPs for a client from a simple Google Doc into agentic steps so they can see all the bits where things can go wrong.

The simple one paragraph instruction got decomposed to 48 steps and that’s just about issuing an invoice. Mechanical, transactional, no judgment or reasoning needed.

This curve is also immune to vibes. A model that goes from 95% to 99% per-step reliability is a massive engineering achievement, and it buys you a half-life of 69 steps instead of 13. Useful! But the singularity crowd needs half-lives in the tens of thousands for “fire and forget an employee” to be real. METR has been measuring exactly this, the task horizon agents can complete at 50% reliability, and it doubles roughly every seven months. That’s a real, steady, impressive curve. It is also a decade-long curve, not an eighteen-months-to-the-permanent-underclass curve.

Karpathy called it the decade of agents.

The decay equation is why it’s a decade.

Workslop is what happens when you pretend decay doesn’t exist

Last September, Stanford’s Social Media Lab and BetterUp put a name on the smell: workslop. AI-generated content that masquerades as good work but lacks the substance to advance the task. Their survey of 1,150 US desk workers: 40% had received workslop in the past month, each instance took nearly two hours to deal with, an invisible tax of $186 per employee per month. Nine million dollars a year for a 10,000-person org.

Everyone read that study as a story about lazy coworkers. Wrong lesson.

Workslop is output shipped past its half-life. Someone ran a long chain, did zero regeneration along the way, and exported the decay into your inbox. The two hours the recipient burns “dealing with it”? That’s not overhead. That’s the regeneration step, the repeater, being performed manually, by the most expensive component in the building, at the worst possible point in the pipeline.

And it gets worse, because decay doesn’t stop at the handoff. The recipient skims the slop with AI. Summarizes it with AI. Builds a deck on top of it with AI. Every handoff appends more unverified steps to the same chain. pⁿ doesn’t care about org charts. The organization becomes one long transatlantic cable with no repeaters and a Whitehouse in every department pumping voltage.

The Jellyfish data I quoted last time (two times the throughput at ten times the token cost) is this exact phenomenon.

Throughput of what? Of artifacts.

Decayed artifacts, produced faster, verified never, regenerated downstream by humans who don’t show up in the token dashboard.

Tokenmaxxing an unharnessed agent doesn’t just waste money. It buys decay at scale. More tokens per dollar means more autonomous steps means more half-lives elapsed before a human ever looks.

You are paying premium prices to make your output wrong faster.

Jensen wants to be alarmed if your engineers don’t burn $250k in tokens.

I want to be alarmed about what those tokens decayed into.

The repeater is a harness

The thing that eventually made undersea cables work wasn’t more voltage and it wasn’t a magically lossless wire. It was the regenerative repeater: a station partway down the line that reads the weakened signal, decides what it was supposed to be, and re-emits it clean. The decay still happens. It just never gets to compound.

A harness is verification placed at intervals shorter than the half-life, with the authority to reject and retry.

The math flips completely. Take the same 100-step task at 99% per-step reliability:

No harness: 0.99¹⁰⁰ ≈ 37% success.
Harness: cut it into 10 segments of 10 steps. Each segment ends at a gate — a test suite, a schema validator, a checksum against source data, a second model with one job: is this true and complete? Each segment passes raw about 90% of the time. The gate catches failures and retries, up to 3 times. Segment reliability: ~99.9%. Full task: ~99%.

Cost of the retries? About 11% more tokens on average.

Eleven percent more tokens. Sixty-two points more reliability. That is the single highest-ROI trade available in this entire industry right now, and it’s not close.

It’s also why I keep repeating Mitchell Hashimoto’s definition like a prayer: Agent = Model + Harness.

The model sets p.

The harness sets n, the distance between repeaters.

Big AI sells you p improvements at frontier prices, 2× hikes, and tokenizer “upgrades” that eat 35% more tokens. n is free. n is yours. n is engineering.

You can see it in everyone who’s actually winning with agents.

Cherny’s parallel-PR workflow: PRs are gates, CI is a repeater.
Geoff Huntley’s Ralph loops: regeneration as a lifestyle.
Alfred, my own butler ran ~1B tokens last month including 11k tool calls and over 20k messages. The only reason that isn’t a slop volcano is that every worker writes into a vault that is checked, schema’d, and diffed before anything is treated as truth. The vault is the galvanometer.

Meanwhile the slop producers are doing the opposite: maximizing n (longer autonomous runs! more agentic! lights-out!) while p is whatever the model gods provided that quarter.

The 21st century equivalent of two thousand volts into tree sap, where the org chart is the insulation.

The playbook

Find your half-life empirically. Run your real task, log the step at which output stops surviving scrutiny. That number, not the demo, not the benchmark, is your system’s actual capability. Most teams have never measured it. Most teams would be horrified.
Place gates inside the half-life, always. If correctness halves every ~15 steps, a checkpoint every 25 steps is theater. Gates must be deterministic where possible (tests, schemas, diffs against ground truth) and adversarial where not (a verifier model that gets paid to say no).
Never let unregenerated output cross a human boundary. The handoff is sacred. If it leaves your agent and lands in a colleague’s lap, it passes a gate first or it doesn’t leave. This single rule eliminates workslop. Not reduces. Eliminates. Workslop is, by definition, unregenerated signal that crossed the boundary.
Spend tokens on regeneration before generation. The 11% retry overhead beats every other use of marginal budget. If your token spend is growing and your cost-per-verified-task isn’t falling, you’re not adopting AI. You’re funding decay.
Treat “more autonomy” as a cost, not a feature. Every step of autonomy you grant is distance between repeaters. Grant it when measurement says the half-life supports it, not when a keynote does.

Finishing thoughts

The industry’s pitch is that the models will get so good the decay stops mattering. Maybe, on a long enough timeline; METR’s doubling curve is real. But Whitehouse also wasn’t wrong that more voltage moves a signal further. He was wrong that it was the binding constraint, and the ocean floor got an expensive lesson in the difference.

In 2026 the binding constraint is not intelligence. It’s compounding. pⁿ is undefeated. The teams drowning in workslop and the teams quietly shipping 10× are using the same models, paying the same rate cards, reading the same breathless launch threads. The difference is that one group is buying voltage and the other one built repeaters.

Slop is the signature of exponential decay running through an organization that hasn’t done the engineering.

The decay is physics, slop is a choice.

Lumberjack

Discussion about this post

Ready for more?