← Back to home Harness Engineering cover

Harness Engineering

Name: Harness Engineering — From Using AI to Controlling AI
Author: Ken Imoto

From Using AI to Controlling AI

Five interpretations from OpenAI, Anthropic, LangChain, Martin Fowler, and academia — merged into one system for engineers running AI agents in production

Your AI agent runs. Does it obey? OpenAI, Anthropic, and LangChain each define harness differently. This book merges all 5 interpretations into one system.

Harness Trilogy [Architecture]. Defining what a harness is, across 5 interpretations.

Read on Kindle Read sample chapters See chapter list

30+ technical books across 4 languages · Sold on Kindle in 6 countries · From a year of real production use

Included with Kindle Unlimited Published: 2026-03-15 Updated: 2026-04-20

ken imoto — Author of the Practical Claude Code & Harness Engineering series. 30+ technical books across JA/EN/PT/ES. · 7-day return window via Amazon

📖 Read for free

Read three full chapters right here before you buy. Liked it? Continue on Kindle.

01 Preface — Why 'Harness,' and Why Now

A harness — the tack that controls the power of AI

A Tuesday at 3 a.m.

3 a.m. on a Tuesday. The on-call engineer at one team gets jolted awake by a PagerDuty alert.

API costs have spiked. They check the dashboard: over $400 burned in the past hour. Digging in, they find that an AI agent deployed the day before has been hammering an unstable API with retries. Every error sends it back into the “let me try again” loop, and it ran like that until morning.

The agent wasn’t the problem. The model was fine. The prompt was carefully written. What was missing was a harness. They told the agent “run,” but never gave it brakes or a steering wheel.

This story isn’t unusual. There’s a phrase that gets passed around the field:

“The model is commodity. The harness is moat.”

When an agent that worked perfectly in a demo breaks in production, it’s almost always a harness problem.

In February 2026, OpenAI published a blog post: “Harness engineering: leveraging Codex in an agent-first world.”

Here’s what it said: for five months, an engineering team didn’t write a single line of code by hand. They built a production application of over a million lines using Codex agents alone. Build time: one-tenth of writing it manually.

“Humans steer. Agents execute.”

Engineers didn’t get their jobs taken. The definition of the job changed.

That post lit the fuse. Then came the “$47,000 retry storm” report from a weekend in February 2026. A data-enrichment agent misinterpreted an API error code as “retry with different parameters” and made 2.3 million API calls. Monday morning, engineers came back to a $47,000 bill. Nice that the agent worked over the weekend, but not great when the deliverable is zero and the invoice still arrives. A few days later Anthropic published two harness-design guides. LangChain defined “Agent = Model + Harness.” Martin Fowler wrote a commentary. An academic paper went up on arXiv.

2024 was the year of Prompt Engineering. The era of polishing “what to ask AI.”

2025 was the year of Context Engineering. Andrej Karpathy said “The hottest new programming language is English,” and the work shifted to designing “what to show the AI.”

In 2026, the scope widens to Harness Engineering. “How do you design the entire environment the agent operates in?”

But the term gets interpreted slightly differently depending on who’s writing. OpenAI and Anthropic emphasize different things. LangChain and Martin Fowler approach it from different angles. The academic papers come at it from yet another direction.

This book gives a structured overview of Harness Engineering.

The relationship between the three engineering practices (Prompt / Context / Harness)
How the major players (OpenAI / Anthropic / LangChain / Martin Fowler / academics) interpret it differently
The anatomy of the six building blocks
How it sits next to related ideas (Vibe Coding / Spec Coding / Agent Frameworks)
Practical case studies from the Japanese-speaking community
Where it’s all going

It’s both a concept-organization book and a hands-on guide you can use tomorrow. My goal is simple: when someone asks “okay, but what is a harness?”, you can hand them this book as a clear answer.

Who this book is for

Engineers who have started using AI agents (Claude Code, GitHub Copilot, Cursor, etc.)
People who have written an AGENTS.md or CLAUDE.md but aren’t sure if they got it right
People who know Prompt Engineering but are hearing “Harness Engineering” for the first time
Managers and tech leads who want to bring AI agents into their team

The only prerequisite is the basics of Prompt Engineering. Having heard of Few-shot and Chain-of-Thought is enough.

How to read this book

You can read it cover to cover, or jump to the chapters you find interesting. That said, three chapters are worth reading no matter what:

Chapter 1: understand how the three engineering practices relate (the map of the territory)
Chapter 8: learn the six building blocks (the skeleton of practice)
Chapter 11: learn how to write AGENTS.md (something you can use tomorrow)

Continue this chapter on Kindle →

02 The Three Engineering Evolutions — Prompt → Context → Harness

Why 40% fail

In 2026, 40% of AI agent projects fail (Company of Agents survey).

What’s behind the failures? Wrong model? Bad prompts?

Neither. “The difference between success and failure isn’t the model.” That’s the consensus from the field.

A survey at Y Combinator DevTool Day (March 2026) interviewed CTOs and CPOs and found a common factor across failed projects: no harness. They never designed the environment the agent operates in.

75% of YC’s enterprise companies already have coding agents deployed. Yet many of them hit the same wall: “works in the demo, collapses in production.”

In March 2026, Linear declared “issue tracking is dead.” The reasoning: feed issue context straight to a coding agent and humans no longer need to manage tickets. Enterprise workflows are getting redesigned with agents as the default assumption.

Putting agents into production at this inflection point without understanding the harness is like driving on the highway without a seatbelt. You can go fast, but you’ll fly off the road at the first curve.

Timeline

Timeline of the three engineering evolutions

What makes the three different

Prompt Engineering

Subject: A single prompt (input text)

Optimizing “what to ask AI and how.” Few-shot, Chain-of-Thought, ReAct. The art of maximizing accuracy in one exchange.

Context Engineering

Subject: Everything you feed the AI (system prompt + RAG + tool definitions + memory)

In Andrej Karpathy’s words, “it is a lot more than just the prompt itself.” As single-prompt approaches became insufficient in more cases, teams had to design the entire dynamically constructed context window.

Philip Schmidt (formerly Hugging Face, Google DeepMind) argues that “the new skill for using AI isn’t prompting. It’s context engineering.”

Harness Engineering

Subject: The entire operating environment (context + constraints + tools + lifecycle + feedback + monitoring)

Louis Bouchard’s definition is the most concise:

Context Engineering is “what you send to the model.” Harness Engineering is “how the whole thing runs.”

Not the prompt, not the context. The environment around the model. If cooking is the analogy, the prompt is the recipe, the context is the ingredients, and the harness is the kitchen itself.

A nesting structure

These three aren’t competing concepts. They nest inside each other.

Nesting: Harness ⊇ Context ⊇ Prompt

SmartScope’s article puts it cleanly:

Harness ⊇ Context ⊇ Prompt

Elephancube’s Japanese article uses an apt metaphor:

When you build a house, walls need a foundation, and a roof needs walls. Good prompts let context design work, and good context design lets the harness function.

”Replaced” or “layered”?

Here’s where interpretations diverge.

The “replaced” camp:

Data Science Dojo titled an article “Why Harness Engineering Is Replacing Prompt Engineering.” Their argument: agents in 2025–2026 operate in environments that prompts and context were never designed for.

The “layered” camp:

AnyTech (Medium) writes: “There’s no essential difference among the three; the terminology is shifting because LLMs and agents now handle a broader scope of work.” A reassuring take. You don’t have to throw out everything you knew each time a new buzzword arrives.

This book’s position: the layered camp. Prompt Engineering is still important. It’s just no longer sufficient on its own in a growing number of cases. Harness Engineering subsumes prompt and context, then adds an outer layer of constraints, lifecycle management, and feedback loops.

Why now?

A piece by WonderLab on DEV.to puts it well:

The timing isn’t a coincidence. In 2025, AI agents went from “cool demos” to “actual productivity tools.”

Once agents run autonomously for long stretches, optimizing one prompt can’t keep them under control. Context design alone is also insufficient. You have to design the whole environment.

That urgency is what gave birth to Harness Engineering.

Continue this chapter on Kindle →

03 Defining Harness Engineering

What “works in the demo, breaks in production” really means

harnessengineering.academy puts it this way:

Don’t deploy AI agents without a harness, the same way you wouldn’t run software directly on a CPU without an OS.

A CPU can compute. But without an OS, you can’t manage memory, schedule processes, or control I/O. Same for models: a model can generate text. But without a harness, you can’t manage context, control tools, or handle failures.

Nine out of ten “works in the demo, breaks in production” agents are a harness problem. To be specific:

Demo: A controlled environment. Questions arrive in the expected flow. APIs work. Context is short.
Production: Chaos. Unexpected inputs. APIs go down. Context blows up. Race conditions from parallel execution.

A harness is the cushion that absorbs production chaos. The demo is the showroom; production is the open road. Whether a car that runs perfectly in the showroom can survive the road is a separate question.

Where the word “harness” comes from

NxCode spells out the etymology:

The term is borrowed from equestrian equipment. A horse is powerful and fast, but without reins, a saddle, and a bridle, it goes wherever it wants. The AI model is the horse. The harness is everything that channels that power into productive work.

A note post by kazu_t uses an OS-vs-application-code analogy:

If the prompt is application code, the harness is the OS.

Aakash Gupta (Medium) puts it even more simply:

The model is the engine. The harness is the car. The best engine in the world goes nowhere without steering and brakes.

Distinguishing it from “test harness”

Parallel.ai raises an important caveat:

Don’t confuse it with a test harness (an old term in software engineering). A test harness is a framework that feeds inputs and auto-checks outputs. An agent harness is the entire operating environment of an AI.

Search the term and you’ll get hits about electrical wiring and the CI/CD platform Harness.io. The harness in this book refers to the control environment for AI agents.

Comparing the definitions

Here are the definitions side by side.

OpenAI

“Humans steer. Agents execute. By deliberately imposing this constraint, we built what was needed to lift engineering speed by orders of magnitude.”

Harness = the environment in which agents reliably write code.

Anthropic

“Multi-context-window support, environment setup in the initial context, context management, sub-agent composition.”

Harness = a stable control system for long-running agents.

LangChain

“Agent = Model + Harness. The model has the intelligence; the harness makes that intelligence useful.”

Harness = the outer shell that converts model intelligence into useful work.

Martin Fowler

“Strongly typed languages turn type checks into sensors. Module boundaries provide architectural constraint rules. Frameworks like Spring abstract away details the agent doesn’t need to think about, implicitly raising the agent’s success rate.”

Harness = the total set of implicit and explicit constraints embedded in a codebase.

Louis Bouchard

“Stop saying ‘the model is dumb.’ Say instead, ‘my system tolerated this failure mode.’”

Harness = environment design that doesn’t tolerate failure modes.

What they all agree on

The wording differs, but everyone agrees on a few points.

The harness is outside the model: this isn’t about tweaking model parameters
Constraints are enforced, not requested: the system doesn’t move forward unless they’re satisfied
Feedback loops are mandatory: evaluate outputs, keep improving the environment
The human role changes: from writing code to designing the environment

This book’s working definition of Harness Engineering

Combining the definitions above, this book uses:

Harness Engineering is the discipline of designing the entire environment in which AI agents operate autonomously over long periods of time. It includes context management, constraint enforcement, lifecycle management, feedback loops, monitoring, and security boundaries.

What goes wrong without a harness

The value of a harness becomes clear when you look at what fails without one.

Problem	Without a harness	With a harness
Code style consistency	Agent writes in a different style every time	Linter hook auto-unifies
Test creation	Have to ask “please write a test” every time	Pre-commit blocks untested commits
Handling secrets	Agent embeds API keys in code	Security boundary detects and rejects
Long-running tasks	Context bloats, quality drops	Context resets + progress files
Reproducible quality	Depends on whoever’s working (human or AI)	Guaranteed by the environment

A harness turns “asks” into “mechanisms.” Saying “please write tests” 100 times is less reliable than building the system once so commits without tests can’t happen. Same as training junior team members.

The decisive difference from Prompt Engineering

Prompt Engineering optimizes “one exchange.” Harness Engineering optimizes “100 exchanges.”

For a single exchange, a good prompt is enough. But when an agent codes all day, the effect of the first prompt has faded by the 50th. Context bloats, the original instructions slip into the distant past, and the agent starts behaving differently than it did at the start.

A harness solves that. If a prompt is “the first push,” a harness is “the gravity that’s always pulling.”

From the next chapter, we examine each player’s interpretation one by one.

Continue this chapter on Kindle →

Other editions: 日本語 Português Español

Overview

Harness Engineering, mapped across the 5 interpretations from OpenAI, Anthropic, LangChain, Martin Fowler, and academia. The first systematic guide that distills the 6 building blocks, the AGENTS.md/CLAUDE.md/hooks implementation patterns, and Self-Evolving Agents — the practical reference for the 2026 keyword.

What you will be able to do

Decompose any harness into the 6 building blocks framework
Choose between AGENTS.md, CLAUDE.md, and hooks for each task
Compare interpretations from OpenAI Codex, Anthropic, LangChain, Martin Fowler, and academia in one place
Implement Self-Evolving Agent patterns (self-improving harness)
Place tools like Vibe Coding, Spec Coding, and Agent Frameworks on a clear technology map

Who is this book for

[AI Agent Developer] Want the systematic view of harness as the 2026 keyword
[Claude Code User] Ready for the layer above CLAUDE.md
[Tech Lead] Designing AI agent ops across an entire team
[Researcher] Comparing OpenAI, Anthropic, and LangChain interpretations side-by-side
[Self-Evolving Curious] Looking to build self-improving agents
[Tool Picker] Mapping Vibe Coding, Spec Coding, and Agent Frameworks

Problems this book solves

I hear 'Harness Engineering' a lot but can't actually explain what it is
OpenAI and Anthropic seem to define it differently
The line between AGENTS.md and CLAUDE.md feels blurry
I don't know when to reach for hooks
Self-Evolving Agent design patterns aren't clear to me
The boundary between harness and Agent Frameworks (LangChain etc.) is murky

Where this book stands

Cross-vendor (5 interpretations compared in one book — first of its kind)
Implementation-focused (not just theory — concrete AGENTS.md / hooks examples)
Intermediate to advanced (Claude Code / CLAUDE.md basics assumed)
Harness-specific (single topic, 19 chapters of depth)

Why this book

First book to integrate the 5 interpretations from OpenAI, Anthropic, LangChain, Martin Fowler, and academia
Six-building-block framework for systematizing 'what is harness?'
Goes all the way to Self-Evolving Agents (self-improving harness) and future predictions
Real implementation patterns for AGENTS.md / CLAUDE.md / hooks with concrete examples
Built on a Zenn article that drew 12,000 views — this is the full-fledged version

How this differs from other AI books

Compared to	This book's difference
Vendor docs (OpenAI / Anthropic / LangChain)	Not single-vendor view. This integrates 5 interpretations and explains why they disagree.
Prompt / Context Engineering books	Tackles the layer above prompt and context — the third tier of the stack.
Agent Framework guides (LangChain Agents etc.)	Not framework-specific. Maps the boundary between harness and Agent Frameworks.

01 Preface — Why 'Harness' now Free preview
- 1-1 A Tuesday at 3 a.m.
- 1-2 Who this book is for
- 1-3 How to read this book
02 The Three Engineerings (Prompt → Context → Harness) Free preview
- 2-1 Why 40% fail
- 2-2 Timeline
- 2-3 What makes the three different
- 2-4 A nesting structure
- 2-5 "Replaced" or "layered"?
- 2-6 Why now?
03 Harness Engineering: Definition and Big Picture Free preview
- 3-1 What "works in the demo, breaks in production" really means
- 3-2 Where the word "harness" comes from
- 3-3 Distinguishing it from "test harness"
- 3-4 Comparing the definitions
- 3-5 What they all agree on
- 3-6 This book's working definition of Harness Engineering
- 3-7 What goes wrong without a harness
- 3-8 The decisive difference from Prompt Engineering
04 OpenAI's Take — Codex and the million-line experiment
05 Anthropic's Take — Harness for long-running agents
06 LangChain's Take — Agent = Model + Harness
07 Martin Fowler's View — The implicit harness in every codebase
08 The Academic View — arXiv papers and formal specification
09 The Six Building Blocks — Anatomy of a harness Free preview
10 Technology Map — Vibe Coding / Spec Coding / Agent Framework
11 Reconciling the Differences — What everyone agrees and disagrees on
12 AGENTS.md / CLAUDE.md Practical Design
13 Hooks / Lifecycle / Feedback Loops
14 Self-Evolving Agent — A harness that improves itself
15 The Future of Harness Engineering
16 Afterword
17 References Free preview
18 About the Author Free preview
19 Colophon Free preview

The phrase Harness Engineering is everywhere, and means something different to everyone. OpenAI talks about scaling Codex. Anthropic talks about long-running agents. LangChain frames it as Agent = Model + Harness. Martin Fowler points out that every codebase already has an implicit harness.

Each of them is right. But until now, no book has stitched these views into a single system.

This book maps what a harness is, how to design one, and how to operate it. It synthesizes the 5 interpretations into 6 building blocks, then walks through implementation with AGENTS.md, CLAUDE.md, and hooks, all the way to Self-Evolving Agents.

“Prompt was 2024. Context was 2025. Harness is 2026.”

Related books

Dive deeper with related articles

Read on Kindle

Included in Kindle Unlimited

Read on Kindle

Topics: Harness EngineeringAI AgentAGENTS.mdCLAUDE.mdSelf-Evolving Agent

* This page contains Amazon Associates links. Purchases may earn the author a referral fee.

Harness Engineering

📖 Read for free

A Tuesday at 3 a.m.

Who this book is for

How to read this book

Why 40% fail

Timeline

What makes the three different

Prompt Engineering

Context Engineering

Harness Engineering

A nesting structure

”Replaced” or “layered”?

Why now?

What “works in the demo, breaks in production” really means

Where the word “harness” comes from

Distinguishing it from “test harness”

Comparing the definitions

OpenAI

Anthropic

LangChain

Martin Fowler

Louis Bouchard

What they all agree on

This book’s working definition of Harness Engineering

What goes wrong without a harness

The decisive difference from Prompt Engineering

Overview

What you will be able to do

Who is this book for

Problems this book solves

Where this book stands

Why this book

How this differs from other AI books

Table of contents

Related books

Dive deeper with related articles

Read on Kindle