Research May 2026 ~22 min read

The ABC Task Model –
what humans, machines, and hybrids actually do.

An empirically grounded taxonomy for understanding which tasks can be automated — and which cannot. Built on 4M AI conversations, the Anthropic Economic Index, and a sector mix that shows why the same technology touches 90% of tasks in software but only 25% in therapy.

CRAiD Design Research · Carlo · May 2026 · Based on Anthropic Economic Index Reports V1–V3, Handa et al. (Anthropic, 2025, arXiv:2503.04761), MIT Sloan, NIST AI 200-1, WEF Future of Jobs 2025, Goldman Sachs, Stonebranch, Cisco Workplace Index, Sana Labs, Eightfold, and 11 further sources.

View the model Self-Assessment

01The Model 02Data & Method 03Sector Mix 04Four Truths

00 Introduction

Why "degree of automation" is the wrong question.

In spring 2026, Anthropic measured what actually happens on its platform. Four million conversations, classified against the tasks in O*NET — the US Department of Labor's occupational database. The result is not what decks claim, and not what headlines suggest. It is more precise and more uncomfortable.

57% of conversations are Augmentation — humans iterating with the model, learning, validating, refining. 43% are Automation — humans delegating complete tasks. Over time, the ratio shifts: directive automation has risen from 27% to 39% in just eight months. And simultaneously, real-world tests of autonomous agents show that fewer than 2.5% of submitted tasks are completed end-to-end.¹

The thesis that "AI automates knowledge work" breaks down against this data. It isn't wrong — it's too coarse. AI automates some tasks very well, some partially, and some not at all. And the share in which it does so varies radically across sectors: 14% of all AI conversations are about software development, but among construction workers, anesthesiologists, and therapists, the technology barely registers.²

The right question isn't "how much are you automating?" but rather "which of your tasks belong in which class — and who in your organization is making that call consciously?"

We call the classes A, B, and C. They are measurable in the Anthropic data, established in academic literature since Autor (2003, 2015), and codified as a standard in the NIST taxonomy for Human-AI Teaming.³ What we add: a pragmatic translation into org, hiring, and tech-stack decisions you need to make in 2026.

This article is structured in six parts. First, the model itself (section 01), then the underlying data and method (02), then the sector mix with concrete figures from the Anthropic O*NET mapping (03). Section 04 lists four uncomfortable truths the data forces us to confront. Section 05 translates these into implications for org structure, hiring, and technology stack. Section 06 is a five-question self-assessment.

¹ Anthropic Economic Index Report V3 (March 2026), Handa et al. (2025), and industry reports on agent performance. Full source list at the end.
² Handa et al. (2025), Figure 11/12, based on 4M Claude.ai conversations.
³ Autor, D. (2015) "Why Are There Still So Many Jobs?". NIST AI 200-1 "Taxonomy for Human-AI Teaming" (2025).

01 The Model

Three Classes.
Three very different playing fields.

The classification is not a spectrum. Treating A, B, and C as a gradient means building the wrong tooling and the wrong roles. Each class demands its own workflow, trust, and skill setup. Here are the definitions we work with.

Class A A

0% Automation

Judgment, accountability, empathy. Humans decide; AI stays out of the loop. These tasks carry consequences — legal, ethical, human — that no one but a human can own. Even where execution is technically possible, delegation itself is the error.

Examples: Termination conversation · final design decision · hire yes/no · clinical risk disclosure · external crisis communication · ethical judgment in compliance · final pricing authority · case acceptance in law

Class B B

60–80% Automation

Hybrid. Co-pilot zone. AI makes the proposal; humans validate, correct, sign off. In Anthropic's terminology: "Task Iteration", "Validation", "Learning". This is where the majority of productive AI usage sits today — and exactly where organizations are won or lost in 2026.

Examples: Code-review suggestions · research synthesis · brief and pitch drafts · customer reply drafts · pricing recommendations · marketing strategy drafts · UI mock refinement · SQL validation · technical documentation

Class C C

90–100% Automation

Full automation. Humans involved only in edge cases or as auditors. In Anthropic's terminology: "Directive". Growing (27% → 39% in 8 months), but rarely as "pure" as the hype suggests. Real C-tasks need clear inputs, clear outputs, clear failure containment — and are more expensive to build than the pitch promises.

Examples: Routine classification · data extraction from standard forms · Tier-1 FAQs · markdown/format conversion · standard translations · report generation from structured sources · simple code snippets

What the classes are not

Three common misconceptions, before we get into the data:

ABC is not a ranking. A is not "better" than C, and C is not "the future". A healthy organization has tasks in all three classes — and knows which are which.
ABC is not static. Tasks shift between classes, often faster than org charts. What was B in 2024 may be C in 2026 — or flip back to an A question because a regulator took notice.
ABC applies to tasks, not jobs. A marketing manager has A-, B-, and C-tasks simultaneously. Trying to classify entire roles misses the point.

02 Data & Method

What the empirical record shows —
and what the numbers don't say.

Four numbers that underpin the ABC distribution. Each comes from 2025 or 2026, each is cross-checked against at least one independent source, and for each we also say what it does not prove.

Augmentation

57%

of all analyzed Claude conversations are iterative — human and AI thinking together. In Anthropic's taxonomy: Task Iteration, Learning, Validation. This is Class B, measured in the field.Handa et al., 4M conversations, 2024–2025

Automation

43%

of conversations are one-off delegation ("Directive" + "Feedback Loop"). That's Class C — growing, but not yet the dominant pattern. Significantly higher (~77%) in API-based business contexts.Handa et al., 2025; AEI Report V3

No ROI

95%

of organizations see no measurable return from AI initiatives. Not because the models are weak. Because A, B, and C aren't separated — and the surrounding workflow isn't redesigned.MIT Media Lab, 2025

Autonomy

<2.5%

of tasks are fully completed by autonomous agents today. The rest require humans — for correction, recovery, or escalation. Class C is rarer than every deck claims.Industry reports 2025/2026

How Anthropic measures — and what that means

The most important source for this study is the Anthropic Economic Index, an ongoing report series since 2024. The method in one sentence: Anthropic uses an internal, privacy-compliant system called Clio that summarizes conversations in a privacy-preserving way and classifies them against O*NET tasks — the official US Department of Labor occupational database. This turns 4M anonymized chats into a map where every point corresponds to a real occupational activity.

Three methodological points are important to understand before using these numbers:

1. The 57/43 split is a mode-of-use classification, not an outcome.

"Augmentation" doesn't mean "successful". It means only that a human is in the loop. Anthropic itself cautions: an output can be augmentative and still be garbage. The ABC class describes the form of collaboration, not its quality.

2. The data comes from Claude.ai, not from "the economy".

Claude users skew technical, young, English-speaking, and US-based. Anthropic attempts to correct for this through weighting, but sectors like construction, healthcare, and logistics are underrepresented — not because tasks don't exist there, but because the tools haven't reached them. The ABC classes exist there regardless; they're just empirically thinner.

3. Directive automation is rising fast — but we don't know whether it's capability or confidence.

Anthropic writes verbatim in the V3 report:

„Whether the growth in directive usage is attributable to improving model capabilities or learning-by-doing could signal very different labor market implications."Anthropic Economic Index Report V3, 2026

If it's capability: more tasks are genuinely handled at Class C level — job erosion risk rises. If it's confidence: people are learning to delegate better — and the class is still B, just dressed up as C. The answer is still open today, but the difference determines what you need to do differently in 2026.

Anyone working with the ABC model should hold both readings simultaneously. Classify by outcome, not by marketing.

4. Geography matters more than expected.

In AEI V3, Anthropic shows a notable inversion: in early, low-adoption markets, directive automation dominates; in mature markets, augmentation dominates. The reading: people new to AI let it do everything ("just make it happen"). People who've lived with it longer use it more collaboratively. That's a reason for hope — and a learning path. Augmentation is not the starting point; it's the mature stage.

What the numbers don't say

Three caveats we make explicit in every ABC discussion before giving recommendations:

They say nothing about quality. A 95% automation rate in Tier-1 support can still mean a 30% escalation rate — and generate a net increase in work.
They say nothing about risk. A Class C classification for "medical pre-triage" is technically conceivable but regulatorily Class A. Classification pressure comes not just from data, but from law and ethics.
They say nothing about acceptance. 80% of US workers use unapproved AI at work (Cisco 2026). What runs as C is often shadow B — no audit, no governance.

03 Sector Mix

Same technology.
Radically different distributions.

From the Anthropic O*NET mapping (Handa et al. 2025, 4M conversations), typical ABC distributions can be derived for selected sectors. The figures are order-of-magnitude anchors, not exact quotas — but they show: the same AI touches a software role very differently from a therapy role.

Sector / Role	Current dominant class	Evidence	What it means
Software Development	B → C	~14% of all Claude conversations are code/debugging. Highest penetration of any O*NET occupational group. Feedback loop dominates.	Realistic Class C in narrowly scoped sub-tasks (boilerplate, test generation). Architecture decisions stay Class A.
Technical Writing & Content	B	Directive drafts dominate in writing tasks; iteration and refinement follow. Second-largest cluster after software.	Full automation works for standard formats (release notes, FAQ). Voice and brand stay Class B.
Marketing Management	B	~50% of O*NET tasks show Claude usage — but only in research and strategy drafts. Trade show coordination, product specs, etc. remain human.	Discovery teams (CX × Data × Product) win; classic mid-level execution roles lose.
Legal & Compliance	A → B	Tasks: research, clauses, standard drafts are Class B. Mandate, final risk assessment, strategy stays Class A. Regulatory pressure (AI Act) forces auditability.	A Class C promise here is usually marketing. Real Class C stays confined to standard boilerplate.
Customer Experience / Support	C-pressure	Tier 1 with standard issues: realistic Class C. Tier 2 / complaints / escalation: Class B. Empathy cases: Class A.	Build everything as C and you produce escalation rates. Protect Class A consciously and you keep NPS up.
Education / Tutoring	B	Foreign language teachers have the highest task coverage (~75%), but teaching and assessment responsibility remains human. Augmentation pattern dominates.	Co-pilot in preparation, materials, and practice. Assessment and relationship stay Class A.
Therapy / Care	A	Physical therapists ~25% task coverage — mostly research and patient education. Hands-on treatment virtually 0%.	Class A in the relationship and treatment. Class B in documentation and education. Class C only in administrative back-office steps.
Construction / Anesthesia / Physical work	A	Minimal Claude usage empirically. Not because AI couldn't — because tasks are physical or heavily regulated.	Class B only in documentation and planning. Everything operational stays Class A. Class C promises here are purely hypothetical.

Three observations from the table

First: software is the exception, not the rule. When decks claim "AI is transforming all knowledge work", that's often implicitly based on the software experience. But even within the Anthropic data, no other occupational group comes close to that penetration. An ABC strategy for a hospital, a law firm, or a steel manufacturer doesn't follow the software playbook.

Second: Class A shrinks more slowly than everyone expects. In every heavily regulated or physically present profession, the A-share stays high. That's not a technical limit — it's an institutional limit. Better models won't make it go away.

Third: Class B systematically shifts toward C — but not linearly. Nearly completely in software, partially in content, only in Tier 1 in CX. Anyone who doesn't think through this class by class will build the wrong stack.

Your organization's ABC distribution is your actual value-creation map — not your org chart, not your tool landscape.

04 Four uncomfortable truths

What the data says
that nobody says on stage.

These four statements are directly derivable from the evidence. They're uncomfortable because they don't fit the dominant story — but they are substantiated.

Truth 01 · ROI

If you measure the ROI of your AI projects, you're statistically in the 5% minority. Everyone else claims productivity gains without proving them. When 95% see no measurable return, "we're working on it" isn't a strategy — it's an omission.

Truth 02 · Class B

Most "AI strategies" skip Class B tasks entirely. They go straight for Class C full automation, because that makes for a simpler board narrative. This is precisely why 95% see no return — they're automating tasks without ever understanding the underlying workflow, instead of redesigning the workflow so AI actually helps within it.

Truth 03 · Class A erosion

An organization with no Class A tasks has no accountability left. Without accountability, there's no brand — only process. Augmentation studies show a measurable deskilling effect: judgment skills atrophy when they aren't actively exercised. Class A tasks aren't "what AI can't do yet" — they're a deliberate design decision.

Truth 04 · The platform lie

Most "AI platform" investments apply Class A logic with Class C promises. They sell safety (A) while promising full automation (C) — but nobody orchestrates Class B, which is where value is actually created. An honest platform strategy needs three loops, not one.

05 Implications

What this means for you.
Org. Hires. Tech stack.

The ABC distribution is a design decision. It shows up in three levers you can actively shape — and which will determine in 2026 whether you're a productive AI organization or one that just loudly claims to be.

01 · Org Structure

Three classes, three logics.

A: Senior generalists with deep judgment — explicitly separated from agent loops. With their own decision logs.
B: Cross-functional discovery teams. This is where most new roles emerge — Agent Stewards, Quality Auditors, Real-Time Policy Owners. Cisco observes the highest skill-shift pressure here.
C: Lean operator and audit teams. More observability than headcount. Escalation paths explicit.
Without separation: Class A skills erode quietly. Miss this, and you lose the emergency brake.

02 · Hires

Asymmetric shift.

More expensive: Senior generalists with high Class A capability. The market produces fewer of them; you need more. WEF: negotiation and empathy skills are rising in relative price.
New: AI Orchestration Specialists, Agent Stewards, Governance/Policy Owners, cross-functional discovery profiles.
More interchangeable: Mid-level execution roles with B/C overlap. Shortest skill half-life here.
Hire for the class, not for the title. Ask for judgment cases, not tool lists.

03 · Tech Stack

Three loops, not one mono-stack.

A: Decision support, logging, audit trails — AI stays out of the loop but documents the decision.
B: Co-pilots with clear trust boundaries, a verification step, rollback. E.g.: Cursor pattern, Copilot Studio with approval gate, Sana workflow.
C: Full automation with edge-case escalation and an audit pipeline. E.g.: UiPath / Agentforce with observability layer, MCP/A2A standards.
The most common mistake of 2024–2026: one uniform stack across all three classes. Expensive, sluggish, without impact.

What we see in CRAiD engagements

When we map the ABC distribution with organizations, three patterns show up reliably:

Pattern 1 – The invisible Class A erosion

Tasks that were clearly Class A — case acceptance, hiring decisions, pricing authority — get absorbed into B or C tools without the consequence ever being named. Nobody makes that call actively; it happens over years, through tool purchases and workflow updates. Only a deliberate classification makes visible that accountability was delegated without ever being consciously delegated.

Pattern 2 – Class B as a dumping ground

"Hybrid" becomes a catch-all for everything that isn't clearly A or C. The problem: without trust boundaries, without a verification step, without rollback, that's not Class B — it's chaos with AI involvement. Real Class B requires design, not default.

Pattern 3 – The Class C promise with no edge-case plan

Tier-1 support, standard classification, routine reporting gets declared "fully automated". At the first real edge case — a regulatory question, a complaint, an unusual format — there's no escalation path. The escalation rate eats the efficiency gain.

The fix is the same across all three patterns: the classes have to be named before they can be designed.

06 Self-Assessment

How ABC-ready
is your organization?

Five questions. If you can't answer three or more clearly, your ABC distribution hasn't been deliberately designed yet — it's happening to you.

Can you name the ABC class for your five most important workflows? If not: you're running three classes on one setup. That explains why AI feels "inconsistent" — sometimes wow, often disappointing.
Which Class A tasks have you consciously protected in the last 12 months? If none: your accountability skills are eroding inside agent loops, unnoticed. It becomes visible when harm occurs — which is too late.
Where is an agentic Class C loop already running today — with a complete audit trail, edge-case escalation, and a measured escalation rate? If nowhere: you have no productive Class C. What you have is Class B with Class C marketing.
Who is explicitly responsible for your Class B — for workflow design, trust boundaries, and verification steps? If no one: your Class B is default, not design. That's exactly where the 95%-no-ROI stories come from.
If a regulator calls tomorrow: can you explain, for one of your AI-assisted decisions, who owns it, which data it was based on, and where the audit trail is? If not: you don't have an ABC strategy — you have a risk profile. The gap gets more expensive in 2026 with the AI Act and sector regulators.

07 CRAiD POV

We don't sell tools.
We help you design your ABC mix deliberately — before your organization does it by accident.

CRAiD · Design Consultancy for the Agentic Era

We already work the way your organization will work tomorrow: in a team of humans and agents, with clearly defined ABC classes, measured transitions, and a language that doesn't oscillate between hype and hearsay. Our standard sequence with clients:

ABC Mapping (2 weeks) — classification of the 20–30 most important tasks per function, together with the respective owners. Output: ABC map with rationale, risks, and shift hypotheses.
Class-specific workflow designs (4–6 weeks) — one loop per class: A with decision logs, B with trust boundaries and verification, C with audit pipeline.
Piloting & measurement (8–12 weeks) — one productive B-loop and one productive C-loop, with measurable outcomes instead of vibe-reporting.
Rollout & enablement — discovery teams staffed, steward roles defined, skill path for senior Class A practitioners established.

If you want to set this up in your organization — without 18 months of pilot chaos and without falling into the 95%-no-ROI statistic — talk to us. Write to hello@craid.de or respond to this article at craid.de.

← Back to Insights

Sources & Method

What we're building on.

Every core figure in this article is supported by at least two independent sources. Here are the most important, briefly annotated.

Primary sources

Anthropic Economic Index Reports V1–V3 (2024–2026) — Anthropic, ongoing. Augmentation/Automation distribution (57/43), temporal shift of directive automation (27% → 39%), geographic adoption. anthropic.com/economic-index
Handa et al., "Which Economic Tasks Are Performed with AI?" — Anthropic, 2025. arXiv:2503.04761. Occupational-level empirics via O*NET mapping, 4M conversations. Source of the sector distributions in section 03. arXiv:2503.04761
Tamkin et al., "Clio: Privacy-Preserving Insights into Real-World AI Use" — Anthropic, 2024. Methodological foundation of the AEI analysis.

Academic frameworks

Autor, D. (2003, 2013, 2015) — Foundational papers on task models and human-machine complementarity; theoretical framework for ABC.
Acemoglu & Restrepo (2018) — Model for automation, displacement, and newly created tasks.
UCL / SSRN "Human-AI Task Tensor" — Academic template for multidimensional task taxonomies. SSRN 5134721
SSRN "Automation or Augmentation?" — Model for deskilling effects. SSRN 4910282
NIST AI 200-1: Taxonomy for Human-AI Teaming — Institutional standard for classifying human-AI collaboration. nist.gov

Industry and adoption data

MIT Media Lab (2025) — "95% of organizations see no measurable ROI". Secondary-cited in WEF Future of Jobs.
MIT Sloan — "How AI is Reshaping Workflows and Redefining Jobs". Workflow redesign as a prerequisite for value creation. mitsloan.mit.edu
WEF Future of Jobs 2025 / Organizational Transformation in the Age of AI — Org structures, discovery teams, skill shifts.
Goldman Sachs: AI Labor Market Impact — ~25% of US work hours exposed, 300M jobs.
Stonebranch Global State of IT Automation — 21% "at enterprise scale", 79% below.
Cisco Workplace 2026 Index — Shadow AI use, 80%+ unapproved AI; skill shifts.
Eightfold, Sana Labs, Salesforce, Ruh.ai, UiPath — Industry data on hiring, tech stack adoption, agent platforms 2025/2026.
Berkeley California Management Review (July 2025) — "AI Automation and Augmentation Roadmap". Executive bridge.

Method of this article

Source heuristic: A = primary empirical (Anthropic, MIT) — B = academic (NIST, SSRN, Autor) — C = industry (WEF, Goldman, Stonebranch, Cisco) — D = opinion (LinkedIn posts, vendor whitepapers). Only A and B sources provide core figures; C adds context; D is used for subject matter, not as authority. Every claim in the stats and sector sections is supported by at least two independent sources.

Full annotated source library available on request. This research is part of the CRAiD series "Reports from the agentic frontier". Last updated: May 2026.