Judgment, accountability, empathy. Humans decide; AI stays out of the loop. These tasks carry consequences — legal, ethical, human — that no one but a human can own. Even where execution is technically possible, delegation itself is the error.
An empirically grounded taxonomy for understanding which tasks can be automated — and which cannot. Built on 4M AI conversations, the Anthropic Economic Index, and a sector mix that shows why the same technology touches 90% of tasks in software but only 25% in therapy.
In spring 2026, Anthropic measured what actually happens on its platform. Four million conversations, classified against the tasks in O*NET — the US Department of Labor's occupational database. The result is not what decks claim, and not what headlines suggest. It is more precise and more uncomfortable.
57% of conversations are Augmentation — humans iterating with the model, learning, validating, refining. 43% are Automation — humans delegating complete tasks. Over time, the ratio shifts: directive automation has risen from 27% to 39% in just eight months. And simultaneously, real-world tests of autonomous agents show that fewer than 2.5% of submitted tasks are completed end-to-end.1
The thesis that "AI automates knowledge work" breaks down against this data. It isn't wrong — it's too coarse. AI automates some tasks very well, some partially, and some not at all. And the share in which it does so varies radically across sectors: 14% of all AI conversations are about software development, but among construction workers, anesthesiologists, and therapists, the technology barely registers.2
The right question isn't "how much are you automating?" but rather "which of your tasks belong in which class — and who in your organization is making that call consciously?"
We call the classes A, B, and C. They are measurable in the Anthropic data, established in academic literature since Autor (2003, 2015), and codified as a standard in the NIST taxonomy for Human-AI Teaming.3 What we add: a pragmatic translation into org, hiring, and tech-stack decisions you need to make in 2026.
This article is structured in six parts. First, the model itself (section 01), then the underlying data and method (02), then the sector mix with concrete figures from the Anthropic O*NET mapping (03). Section 04 lists four uncomfortable truths the data forces us to confront. Section 05 translates these into implications for org structure, hiring, and technology stack. Section 06 is a five-question self-assessment.
The classification is not a spectrum. Treating A, B, and C as a gradient means building the wrong tooling and the wrong roles. Each class demands its own workflow, trust, and skill setup. Here are the definitions we work with.
Judgment, accountability, empathy. Humans decide; AI stays out of the loop. These tasks carry consequences — legal, ethical, human — that no one but a human can own. Even where execution is technically possible, delegation itself is the error.
Hybrid. Co-pilot zone. AI makes the proposal; humans validate, correct, sign off. In Anthropic's terminology: "Task Iteration", "Validation", "Learning". This is where the majority of productive AI usage sits today — and exactly where organizations are won or lost in 2026.
Full automation. Humans involved only in edge cases or as auditors. In Anthropic's terminology: "Directive". Growing (27% → 39% in 8 months), but rarely as "pure" as the hype suggests. Real C-tasks need clear inputs, clear outputs, clear failure containment — and are more expensive to build than the pitch promises.
Three common misconceptions, before we get into the data:
Four numbers that underpin the ABC distribution. Each comes from 2025 or 2026, each is cross-checked against at least one independent source, and for each we also say what it does not prove.
The most important source for this study is the Anthropic Economic Index, an ongoing report series since 2024. The method in one sentence: Anthropic uses an internal, privacy-compliant system called Clio that summarizes conversations in a privacy-preserving way and classifies them against O*NET tasks — the official US Department of Labor occupational database. This turns 4M anonymized chats into a map where every point corresponds to a real occupational activity.
Three methodological points are important to understand before using these numbers:
"Augmentation" doesn't mean "successful". It means only that a human is in the loop. Anthropic itself cautions: an output can be augmentative and still be garbage. The ABC class describes the form of collaboration, not its quality.
Claude users skew technical, young, English-speaking, and US-based. Anthropic attempts to correct for this through weighting, but sectors like construction, healthcare, and logistics are underrepresented — not because tasks don't exist there, but because the tools haven't reached them. The ABC classes exist there regardless; they're just empirically thinner.
Anthropic writes verbatim in the V3 report:
„Whether the growth in directive usage is attributable to improving model capabilities or learning-by-doing could signal very different labor market implications."Anthropic Economic Index Report V3, 2026
If it's capability: more tasks are genuinely handled at Class C level — job erosion risk rises. If it's confidence: people are learning to delegate better — and the class is still B, just dressed up as C. The answer is still open today, but the difference determines what you need to do differently in 2026.
Anyone working with the ABC model should hold both readings simultaneously. Classify by outcome, not by marketing.
In AEI V3, Anthropic shows a notable inversion: in early, low-adoption markets, directive automation dominates; in mature markets, augmentation dominates. The reading: people new to AI let it do everything ("just make it happen"). People who've lived with it longer use it more collaboratively. That's a reason for hope — and a learning path. Augmentation is not the starting point; it's the mature stage.
Three caveats we make explicit in every ABC discussion before giving recommendations:
From the Anthropic O*NET mapping (Handa et al. 2025, 4M conversations), typical ABC distributions can be derived for selected sectors. The figures are order-of-magnitude anchors, not exact quotas — but they show: the same AI touches a software role very differently from a therapy role.
| Sector / Role | Current dominant class | Evidence | What it means |
|---|---|---|---|
| Software Development | B → C | ~14% of all Claude conversations are code/debugging. Highest penetration of any O*NET occupational group. Feedback loop dominates. | Realistic Class C in narrowly scoped sub-tasks (boilerplate, test generation). Architecture decisions stay Class A. |
| Technical Writing & Content | B | Directive drafts dominate in writing tasks; iteration and refinement follow. Second-largest cluster after software. | Full automation works for standard formats (release notes, FAQ). Voice and brand stay Class B. |
| Marketing Management | B | ~50% of O*NET tasks show Claude usage — but only in research and strategy drafts. Trade show coordination, product specs, etc. remain human. | Discovery teams (CX × Data × Product) win; classic mid-level execution roles lose. |
| Legal & Compliance | A → B | Tasks: research, clauses, standard drafts are Class B. Mandate, final risk assessment, strategy stays Class A. Regulatory pressure (AI Act) forces auditability. | A Class C promise here is usually marketing. Real Class C stays confined to standard boilerplate. |
| Customer Experience / Support | C-pressure | Tier 1 with standard issues: realistic Class C. Tier 2 / complaints / escalation: Class B. Empathy cases: Class A. | Build everything as C and you produce escalation rates. Protect Class A consciously and you keep NPS up. |
| Education / Tutoring | B | Foreign language teachers have the highest task coverage (~75%), but teaching and assessment responsibility remains human. Augmentation pattern dominates. | Co-pilot in preparation, materials, and practice. Assessment and relationship stay Class A. |
| Therapy / Care | A | Physical therapists ~25% task coverage — mostly research and patient education. Hands-on treatment virtually 0%. | Class A in the relationship and treatment. Class B in documentation and education. Class C only in administrative back-office steps. |
| Construction / Anesthesia / Physical work | A | Minimal Claude usage empirically. Not because AI couldn't — because tasks are physical or heavily regulated. | Class B only in documentation and planning. Everything operational stays Class A. Class C promises here are purely hypothetical. |
First: software is the exception, not the rule. When decks claim "AI is transforming all knowledge work", that's often implicitly based on the software experience. But even within the Anthropic data, no other occupational group comes close to that penetration. An ABC strategy for a hospital, a law firm, or a steel manufacturer doesn't follow the software playbook.
Second: Class A shrinks more slowly than everyone expects. In every heavily regulated or physically present profession, the A-share stays high. That's not a technical limit — it's an institutional limit. Better models won't make it go away.
Third: Class B systematically shifts toward C — but not linearly. Nearly completely in software, partially in content, only in Tier 1 in CX. Anyone who doesn't think through this class by class will build the wrong stack.
Your organization's ABC distribution is your actual value-creation map — not your org chart, not your tool landscape.
These four statements are directly derivable from the evidence. They're uncomfortable because they don't fit the dominant story — but they are substantiated.
If you measure the ROI of your AI projects, you're statistically in the 5% minority. Everyone else claims productivity gains without proving them. When 95% see no measurable return, "we're working on it" isn't a strategy — it's an omission.
Most "AI strategies" skip Class B tasks entirely. They go straight for Class C full automation, because that makes for a simpler board narrative. This is precisely why 95% see no return — they're automating tasks without ever understanding the underlying workflow, instead of redesigning the workflow so AI actually helps within it.
An organization with no Class A tasks has no accountability left. Without accountability, there's no brand — only process. Augmentation studies show a measurable deskilling effect: judgment skills atrophy when they aren't actively exercised. Class A tasks aren't "what AI can't do yet" — they're a deliberate design decision.
Most "AI platform" investments apply Class A logic with Class C promises. They sell safety (A) while promising full automation (C) — but nobody orchestrates Class B, which is where value is actually created. An honest platform strategy needs three loops, not one.
The ABC distribution is a design decision. It shows up in three levers you can actively shape — and which will determine in 2026 whether you're a productive AI organization or one that just loudly claims to be.
When we map the ABC distribution with organizations, three patterns show up reliably:
Tasks that were clearly Class A — case acceptance, hiring decisions, pricing authority — get absorbed into B or C tools without the consequence ever being named. Nobody makes that call actively; it happens over years, through tool purchases and workflow updates. Only a deliberate classification makes visible that accountability was delegated without ever being consciously delegated.
"Hybrid" becomes a catch-all for everything that isn't clearly A or C. The problem: without trust boundaries, without a verification step, without rollback, that's not Class B — it's chaos with AI involvement. Real Class B requires design, not default.
Tier-1 support, standard classification, routine reporting gets declared "fully automated". At the first real edge case — a regulatory question, a complaint, an unusual format — there's no escalation path. The escalation rate eats the efficiency gain.
The fix is the same across all three patterns: the classes have to be named before they can be designed.
Five questions. If you can't answer three or more clearly, your ABC distribution hasn't been deliberately designed yet — it's happening to you.
We don't sell tools.
We help you design your ABC mix deliberately — before your organization does it by accident.
We already work the way your organization will work tomorrow: in a team of humans and agents, with clearly defined ABC classes, measured transitions, and a language that doesn't oscillate between hype and hearsay. Our standard sequence with clients:
If you want to set this up in your organization — without 18 months of pilot chaos and without falling into the 95%-no-ROI statistic — talk to us. Write to hello@craid.de or respond to this article at craid.de.
← Back to InsightsEvery core figure in this article is supported by at least two independent sources. Here are the most important, briefly annotated.
Source heuristic: A = primary empirical (Anthropic, MIT) — B = academic (NIST, SSRN, Autor) — C = industry (WEF, Goldman, Stonebranch, Cisco) — D = opinion (LinkedIn posts, vendor whitepapers). Only A and B sources provide core figures; C adds context; D is used for subject matter, not as authority. Every claim in the stats and sector sections is supported by at least two independent sources.
Full annotated source library available on request. This research is part of the CRAiD series "Reports from the agentic frontier". Last updated: May 2026.