By Mark Pesce — 15 Apr 2026

𝜶 and Harnesses

Mark Pesce · University of Sydney · April 2026

Abstract

Post-Watershed, every business that does not mint tokens must spend them. This places harness engineering - the design of processes that direct token expenditure toward 𝜶 - at the centre of economic activity for nearly all organisations. This paper examines four harnesses of increasing sophistication — a commodity router, a developer copilot, an autonomous agentic factory, and a meta-harness that uses tokens to optimise the harness itself — to show that 𝜶 scales with the quality of the process, not merely the volume of tokens consumed. When tokens are turned on the process that directs tokens, 𝜶 compounds: the flywheel described in Foundations of Post-Watershed Economics turns out to be an observable, measurable phenomenon already operating in the wild.

Introduction

Foundations of Post-Watershed Economics divides the post-Watershed economy into two sides. Infrastructure mints tokens. Everything else spends them. If you are not part of the mint — and nearly all businesses are not — then your entire economic existence sits on the spending side of the equation: Tokens + Process → Value.

That means generating 𝜶 from token expenditure is the principal business activity of the post-Watershed era. How well you direct tokens — how much value your process extracts from each unit of cognition you purchase — determines whether you thrive, survive, or take a sudden Wile E. Coyote gravity plunge.

The process that directs tokens is the harness. The quality of the harness is the quality of your business.

This paper examines four harnesses. Each directs tokens through a different process, at a different level of sophistication, generating a different quantity of 𝜶. Together they form a hierarchy mapping where value lives on the spending side of the post-Watershed economy.

The Router: OpenRouter

OpenRouter gets you the cheapest token. You pick the model — if you care — and you get the lowest price on the market at whatever service quality you are prepared to tolerate. OpenRouter is a pure commodity play: it does not touch the process, does not shape the output, does not add intelligence. It routes.

There is not much margin in this business. But they make it up in volume — and in valuable insights generated by watching who is using tokens for what. The data exhaust of a commodity exchange is itself a source of 𝜶, though not a large one per transaction. OpenRouter's 𝜶 is informational, not productive. OpenRouter knows where the demand is flowing before anyone else does.

This is the thinnest possible harness, adding almost nothing to the token, and capturing almost nothing from the token. Even with this simplistic harness - given sufficient volume - there’s sufficient 𝜶 for a sustainable business.

The Copilot: Claude Code

Claude Code is perhaps the most well-known harness in current use: an interactive coding agent that takes direction from a developer and executes. It does the job "good enough" to help software developers reach 2x productivity easily and 10x with an appropriate amount of planning and skill on the part of the operator.

The 𝜶 here is real and measurable. A developer spending $200 a month on Claude Code who doubles their output has generated substantial 𝜶 — the value of the additional output minus the token cost. At 10x, the 𝜶 is extraordinary. But it depends on the human in the loop. The developer provides direction, judgment, correction, and taste. The harness amplifies the human; it does not replace them.

Claude Code's process is interactive: prompt, respond, review, redirect. The human is an essential part of the process. This means the 𝜶 is partially human-generated — the harness and the human are co-producing value, and you cannot cleanly separate their contributions. The ceiling on 𝜶 is set by the human’s capacity to direct and absorb the output.

As a result, we’re seeing the emergence of second-order phenomena: the harness gets a harness. Practitioners build CLAUDE.md files, custom skills, prompt libraries, workflow templates — process layers that sit on top of Claude Code and shape how it spends tokens before the human even intervenes. The developer who has spent weeks refining a CLAUDE.md that encodes their project's architecture, conventions, and decision history is running a different harness from the developer who opens a blank session and types "fix the bug." Same model. Same tokens. Radically different 𝜶 — because the process wrapping the process extracts more value from every token spent.

Projects like Claudraband take this further — wrapping Claude Code's terminal interface in a layer that adds session persistence, daemon mode, and remote control. The practitioner who builds or adopts these wrappers is compounding process on process: each layer potentially squeezes more 𝜶 from the same token expenditure. This is where copilot-level practitioners actually differentiate: not in how they use the harness, but in how they harness the harness.

The Dark Factory: Agentic Armies

A partial map of the massively agentic 'dark factory' that is Gas Town

Then there is something else entirely.

Steve Yegge's Gas Town is a nearly autonomous agentic "dark factory" for making software. Set it up, give it the tokens it needs — and it needs a lot of tokens — and it will create the software, check it for flaws, check it in to the repo, and keep going until the job is complete. Forget interactive. Forget the human in the loop. Gas Town is an army of coordinated agents executing a plan with minimal human intervention.

The cost-benefit of multiple coordinated agents is multiples of singleton agents — and the 𝜶 is consequently greater. Where Claude Code amplifies one developer, Gas Town replaces the need for a development team entirely. The token expenditure is much higher. The process is more sophisticated. And the 𝜶 per unit of human attention is an order of magnitude greater, because the human attention required approaches zero.

Recent work confirms this pattern. SkyPilot demonstrated that pointing Claude Code at llama.cpp with four AWS VMs and a research-first methodology, where the agent read arxiv papers and studied competing implementations beforewriting any code, produced five kernel fusions that made CPU inference 15% faster on x86 and 5% faster on ARM, in roughly three hours. A code-only agent would not have found these optimisations. The research phase — spending tokens on reading before spending tokens on writing — generated 𝜶 that a simpler process could not.

This is the critical insight: it is not just the volume of tokens that determines 𝜶. It is the sophistication of the process that directs them. An agent that reads before it codes finds optimisations that an agent that only codes will miss. An army of coordinated agents that divide labour, cross-check each other's work, and synthesise across specialisations generates 𝜶 that a single agent cannot — regardless of how many tokens the single agent consumes.

The Flywheel: Meta-Harness

Lastly, the step that closes the loop entirely.

Lee et al. at Stanford published “Meta-Harness: End-to-End Optimization of Model Harnesses” in March 2026 — a system that uses tokens to optimise the process that directs tokens. Meta-Harness is a harness for improving harnesses. It takes a coding agent (Claude Code with Opus 4.6), points it at a filesystem containing the source code, scores, and execution traces of every previous harness candidate, and lets it propose, evaluate, and iterate on new harnesses autonomously.

The results are not incremental. Changing the harness around a fixed model produces up to a 6x performance gap on the same benchmark. On text classification, Meta-Harness outperforms the best hand-designed harness by 7.7 points while using 4x fewer context tokens. On retrieval-augmented mathematical reasoning, a single discovered harness improves accuracy on 200 IMO-level problems by 4.7 points — averaged across five held-out models it had never seen. On TerminalBench-2, the discovered harnesses surpass every hand-engineered baseline and rank first among all Haiku 4.5 agents.

The same model, with a better harness discovered by another model, outperforms every human-engineered alternative. The process matters more than the token.

This is the flywheel from Foundations of Post-Watershed Economics made concrete. Tokens + Process → Value, but also: Tokens + Process → Better Process → More Value. The system spends tokens to improve the thing that directs tokens, which generates more 𝜶, which funds more token expenditure on further process improvement. The paper's own footnote is telling: "this workflow only became practical recently, following major improvements in coding-agent capabilities around early 2026." They describe the Watershed without naming it.

Meta-Harness goes beyond example. The feedback loop is real and already operating. The harness has become a product of token expenditure — an output of the system it governs. Process improving process, 𝜶 compounding 𝜶. What are the limits to improvement?

The Hierarchy of Harnesses

These four examples form a hierarchy:

The router adds no process. It captures informational 𝜶 only, the data exhaust of commodity flow. Margin is thin. Volume is the business.

The copilot adds an interactive process with a human in the loop. It captures real 𝜶, bounded by human capacity to direct and absorb. Margin is good. The constraint is attention.

The dark factory adds autonomous, multi-agent processes, with minimal human involvement. It captures high 𝜶 per unit of human attention. Margin is excellent where you can engineer the process for efficiency. The constraint is the sophistication of orchestration.

The flywheel turns tokens on the process itself, using agents to discover and refine the harnesses that direct other agents. It captures compounding 𝜶 - each iteration of process improvement generates more 𝜶 than the last. The constraint is whether the feedback loop converges on genuine improvements or overfits to its own evaluation criteria.

The pattern is clear: 𝜶 scales with the sophistication of the process, not merely the volume of tokens consumed. More tokens through a dumb pipe generate less 𝜶 than fewer tokens through an intelligent process. When the process itself becomes the target of token expenditure, 𝜶 compounds. The harness is where the value lives — and the meta-harness is where the value accelerates.

The Bitter Lesson, Again

No harness will remain a stable generator of 𝜶.

"The Bitter Lesson" predicts that general capability plus compute absorbs specialised engineering over time. The harness layer - the orchestration that generates 𝜶 - will be progressively absorbed into the models themselves. The research-first methodology that SkyPilot had to engineer explicitly will become something the model does by default. The multi-agent coordination that Gas Town orchestrates through careful architecture will become native model behaviour.

Today, these harnesses are where 𝜶 lives. Tomorrow, those harnesses are another set of spoons.

The Meta-Harness reveals something more interesting than simple absorption. When models can optimise their own harnesses - when the flywheel turns on itself - the distinction between model and harness begins to dissolve. A model that improves its own process gets better at getting better. Compound interest applied to capability itself.

The Meta-Harness accelerates the obsolescence of harness engineering. If the model can discover harnesses that outperform every hand-engineered alternative - and Meta-Harness demonstrates that it can, today, on real benchmarks - then harness engineering as a human specialisation has a visible expiry date. The flywheel does not need a human hand on it to keep running.

The equation does not change: Tokens + Process → Value. But the process that generates the most 𝜶 is always moving upward in sophistication — and the models are always chasing it from below. Meta-Harness shows they have already caught it.

Conclusion

This hierarchy of router, copilot, dark factory and flywheel reads as a menu but behaves as a sequence. Each tier generates more 𝜶 than the last, and each tier has a shorter shelf life than the one below it, because the models are absorbing capability from the bottom up.

The router is already commodity. The copilot's advantage is measured in months - as models improve at self-direction, the human in the loop becomes less necessary for routine tasks. The dark factory has a longer runway, but its orchestration logic will be absorbed into the models that execute it. The Meta-harness - the flywheel - is the most durable precisely because it is the most general: a system that improves the system. But "The Bitter Lesson" does not exempt generality. It rewards it. The flywheel will be absorbed too, and when it is, the model will improve itself without external harness engineering at all.

Harness 𝜶 is real, and right now it is the source of value on the demand side of the token economy. Yet it is continuously depreciating. The businesses that thrive will ride the highest tier of harness 𝜶 they can reach, extract maximum value while the window is open, and invest that value in the things the flywheel cannot mint: physical infrastructure, relationships, trust, regulatory position. Building the best harness is a means, never an end.

The harness is where the 𝜶 is today. Tomorrow is another story.

—

Acknowledgements

This paper was deeply informed by my continuing conversations with John Allsopp, and drafted from my notes using Claude Cowork. I remain responsible for any errors that may have crept in.