Five ascending brutalist concrete pillars illuminated by warm amber light in a vast dark cathedral-like space - representing Stripe's five levels of agentic commerce capability
Back to Observatory

The Observatory · Issue 039 · February 2026

Stripe's Five Levels of Agentic Commerce

What the Payments Giant Gets Right, What It Misses, and What Trust Architecture Demands

By Tony Wood·32 min read


In February 2026, Stripe published its annual letter. Within a document primarily focussed with processing volumes, stablecoin infrastructure, and entrepreneurial acceleration was a framework that deserves serious attention: the Five Levels of Agentic Commerce. It is, in many ways, one of the most telling things Stripe has highlighted. Not because it is complete, but because it reveals exactly where the payments industry's understanding of agentic commerce ends and where the design discipline must begin.

Stripe processes $1.9 trillion in annual volume - roughly 1.6 per cent of global GDP. When a company of that scale publishes a maturity model for agentic commerce, the industry listens. And it should. But listening is not the same as accepting without interrogation. The Five Levels framework is valuable precisely because its omissions illuminate the gap between what agentic AI systems can do and what agentic experience design must address.

This essay is not a critique of Stripe. It is a reading of Stripe - through the lens of AXD. The Five Levels describe a trajectory of agent capability. What they do not describe is the trajectory of human experience. That is the gap this essay addresses.


What Stripe Proposed

Stripe's model presents five ascending levels of agent capability in commerce, each representing a step toward greater autonomy. The framework uses a single, relatable scenario - back-to-school shopping for a child - to illustrate the progression:

Level 1: Eliminating Web Forms

"You research and decide what to buy... It would be handy if you could simply send the URL to your agent and have it fill out your payment and shipping details."

The agent is a typist. It clicks "buy" on your behalf. No decisions are made.

Level 2: Descriptive Search

"I need back-to-school supplies for a third grader in Chicago, including clothes (nothing too itchy or tight!), pencils, notebooks, and a lunch box."

The agent reasons across attributes. Keyword search gives way to situational description.

Level 3: Persistence

"Find me options for back-to-school clothes for Bobby."

The agent remembers. Preferences, history, and context persist across sessions.

Level 4: Delegation

"Get the back-to-school shopping done. Keep it under $400."

The agent decides. It searches, evaluates, and purchases autonomously. Stripe notes: "This is what most people mean today when they talk about agentic commerce."

Level 5: Anticipation

"There is no prompt."

The agent anticipates. It knows the school calendar, the child's preferences, the typical budget. You receive a notification: here is what has been purchased.

Stripe assesses that the industry currently hovers "on the edge of levels 1 and 2." They compare the present moment to the mid-1990s, when HTTP, HTML, URLs, and DNS were being established - a period where "no one knew exactly which protocols or players would win out." The analogy is deliberate: Stripe is positioning its Agentic Commerce Protocol (ACP), its Shared Payment Tokens, and its Agentic Commerce Suite as the infrastructure layer for this transition.


What the Levels Get Right

Before examining what the model omits, it is important to acknowledge what it achieves - because what Stripe gets right is genuinely significant.

First, the levels establish a shared vocabulary. Before this framework, "agentic commerce" meant different things to different people. For some, it meant chatbots that could answer product questions. For others, it meant fully autonomous purchasing agents. Stripe's five levels create a common reference point. When someone says "Level 4 delegation," there is now a shared understanding of what that means. This is not a trivial contribution. Disciplines advance through shared language, and Stripe has provided one.

Second, the framework is honest about where we are. Stripe does not claim we have reached Level 4 or Level 5. It states plainly that the industry hovers at the edge of Levels 1 and 2. This sobriety is rare in a landscape saturated with premature claims about autonomous agents. The honesty lends the framework credibility.

Third, the single-scenario approach is pedagogically effective. By threading the same back-to-school shopping example through all five levels, Stripe makes the abstraction concrete. Each level is immediately understandable because the context is consistent. This is good communication design.

Fourth, Stripe correctly identifies interoperability as the prerequisite. The letter states that "our ascent through the five levels depends on our ability to work together." This is precisely right. Agentic commerce cannot advance through proprietary systems. The Universal Commerce Protocol, the Agentic Commerce Protocol, and open standards are not optional infrastructure - they are the foundation.


The Transaction Bias

The Five Levels framework has a structural bias that shapes everything it describes and everything it omits. It is a transaction-centric model. Each level is defined by what the payment system does - what the agent can execute, what the commerce infrastructure supports, what the checkout flow looks like. This is entirely natural for Stripe. Stripe is a payments company. Its perspective is the perspective of the transaction.

But agentic experience design is not concerned with transactions. It is concerned with relationships. The transaction is the visible event. The relationship is the invisible architecture that determines whether the transaction should happen at all, whether the human trusts the agent to execute it, and what happens when it goes wrong.

Consider the difference. Stripe's Level 4 says: "The system handles the search, the evaluation process, and the purchases on your behalf." From a transaction perspective, this is a complete description. The agent searched. The agent evaluated. The agent purchased. The transaction is done.

From an AXD perspective, the description has barely begun. How did the human decide to delegate? What constraints did they specify - and what constraints did they fail to specify? What does the agent do when it encounters ambiguity? How does the human monitor progress? What happens when the agent makes a choice the human would not have made? How is trust calibrated after the first delegation succeeds - or fails? These are not edge cases. They are the primary design surface of agentic commerce.

The transaction bias is not a flaw in Stripe's thinking. It is a limitation of Stripe's vantage point. Payments companies see commerce from the perspective of the checkout. Agentic experience design sees commerce from the perspective of the trust relationship that precedes, governs, and outlasts the checkout.


The Missing Layer: Trust

The word "trust" appears once in Stripe's description of Level 4: "You trust it will weigh trade-offs as you would." This is the only mention of trust across all five levels. It is presented as a precondition - something that exists before delegation happens - rather than as a design challenge that must be actively engineered.

In Agentic Experience Design, trust is the primary material. It is not a precondition. It is the thing being designed. Trust is built through repeated interactions, calibrated through transparency, damaged through failures, and recovered through designed repair mechanisms. Trust has architecture - it is not a feeling that simply appears when the technology is good enough.

What would a trust-aware version of the Five Levels look like? At Level 1, trust is trivial - the agent is a typist, and the human has already made every decision. At Level 2, trust begins to matter - the agent is interpreting a situation, and the human must trust that interpretation. At Level 3, trust becomes historical - the agent's persistent memory creates an accumulated relationship that can be right or wrong. At Level 4, trust is the entire experience - the human is absent from the decision, and the quality of the outcome depends entirely on the quality of the trust architecture. At Level 5, trust is existential - the agent acts without being asked, and the human must trust not only the agent's judgement but its timing, its priorities, and its understanding of what matters.

Stripe's model treats trust as a constant. AXD treats it as a variable - one that changes with every interaction, every success, every failure, and every context. The Trust Calibration framework exists precisely because trust is not binary. It is a spectrum, and it must be designed as such.


The Missing Layer: Failure

The Five Levels describe a world in which agents succeed. Level 2 finds the right products. Level 3 remembers correctly. Level 4 purchases wisely. Level 5 anticipates accurately. There is no level - and no sub-level - that addresses what happens when the agent fails.

This is not an oversight unique to Stripe. It is endemic to capability-focused models. When you define levels by what the system can do, you naturally omit what happens when it cannot. But Failure Architecture is not a secondary concern in agentic commerce. It is a primary one.

Consider Level 4 again. "Get the back-to-school shopping done. Keep it under $400." What happens when the agent purchases clothes that do not fit? What happens when it selects a notebook brand the child hates? What happens when it optimises for price and sacrifices durability? What happens when the delivery arrives after school has started? Each of these failures requires a designed response - not a generic error message, but a failure architecture that acknowledges what went wrong, explains why, offers remediation, and adjusts future behaviour.

Level 5 is even more consequential. The agent has purchased everything without being asked. The human receives a notification. But what if the notification arrives and the family has already bought the supplies? What if the budget was needed for something else this month? What if the child has changed schools? Anticipatory commerce without failure architecture is not a convenience - it is a liability. The Failure Architecture framework addresses this directly: every autonomous action must have a designed failure path, a recovery mechanism, and a trust repair protocol.


The Missing Layer: Absence

One of the five founding principles of AXD states: "Absence is the primary use state." The most consequential agentic experiences happen when no one is watching. Stripe's Level 5 - Anticipation - is the purest expression of this principle. There is no prompt. The human is entirely absent from the decision process. The agent acts, and the human learns about it afterward.

But Stripe does not address the design of that absence. Absent-State Design is concerned with what the experience feels like when the human is not present - not in a metaphorical sense, but in a concrete, designable sense. What information does the human receive when they re-engage? How is the agent's reasoning made legible after the fact? What is the emotional experience of discovering that an autonomous system has spent your money?

The notification in Level 5 - "here's the back-to-school list of everything that's been purchased" - is not a design solution. It is a design problem. The design of that notification, its timing, its level of detail, its tone, its options for response, its connection to the agent's reasoning - these are the design challenges of absent-state commerce. Stripe's model assumes the notification is sufficient. AXD asks: sufficient for what? For trust? For understanding? For control? For emotional comfort? Each of these requires a different design response.


Capability Is Not Permission

The Five Levels conflate two things that AXD treats as fundamentally distinct: what an agent can do and what an agent should be permitted to do. The levels describe ascending capabilities - the agent can fill forms, then search, then remember, then decide, then anticipate. But capability is not permission. An agent that can operate at Level 4 does not mean the human should delegate at Level 4.

The Delegation Design framework addresses this directly. Delegation is not a binary switch - it is a spectrum of authority, constrained by context, governed by rules, and subject to revision. A human might delegate grocery shopping at Level 4 but insist on Level 2 for electronics. They might trust Level 5 anticipation for household consumables but demand Level 1 confirmation for anything over fifty pounds. The level of delegation is not a property of the system. It is a property of the relationship between the human and the agent, in a specific context, at a specific moment in time.

Stripe's model implies a linear progression - that the industry will move from Level 1 to Level 5 over time, and that higher levels are better. AXD rejects this framing. Higher levels are not better. They are different. The appropriate level of agent autonomy depends on the stakes, the context, the human's trust state, and the quality of the constraint specification. Designing for Level 5 when the human's trust is at Level 2 is not progress. It is a consent violation.


The Linearity Problem

The Five Levels are presented as a ladder - a sequential progression from 1 to 5. This linearity is the model's most significant structural limitation. Trust is not linear. Human agent interaction is not linear. The relationship between a human and an autonomous agent does not progress steadily upward. It oscillates. It regresses. It leaps forward in one domain while remaining static in another.

The Autonomy Gradient framework captures this reality. Autonomy is not a single dimension. It varies by domain, by stakes, by time of day, by emotional state, by recent experience. A human who happily delegates meal planning at Level 4 on Tuesday might insist on Level 1 confirmation on Wednesday - because the in-laws are visiting, because the budget is tight, because the last delegation produced a disappointing result. The gradient is dynamic, contextual, and personal.

A more accurate model would not be a ladder but a matrix - with capability on one axis and trust on the other, modulated by context, stakes, and history. The same human might simultaneously operate at Level 5 for one category, Level 3 for another, and Level 1 for a third. The design challenge is not to push everyone toward Level 5. It is to match the level of autonomy to the level of trust, in real time, for each specific context.


What Stripe Almost Said

The most revealing section of Stripe's 2025 letter is not the Five Levels. It is the section that follows: "A Republic of Permissions." Here, Stripe references Joel Mokyr's Nobel Prize-winning work on the importance of culture, governance, and "nonmarket aggregators" - regulators, committees, courts - in determining whether new technologies succeed or fail. Stripe writes: "While many of our strictures are sensibly motivated, it's more important than ever to ensure that they carefully balance the benefits achieved with the possibilities foreclosed."

This is, almost word for word, the argument of AXD's Ethical Constraints framework. The design of permissions - what agents may do, what they may not, and who decides - is not a regulatory afterthought. It is a design discipline. Stripe's "Republic of Permissions" is the governance layer that the Five Levels model lacks.

Similarly, Stripe's introduction of Shared Payment Tokens - a payment primitive that lets agents initiate payments without exposing credentials - is a trust architecture innovation, even if Stripe does not frame it that way. The token is a trust boundary. It constrains what the agent can do with the human's financial identity. It is, in AXD terms, a delegation constraint - a designed limit on the agent's authority.

And Stripe's concept of machine payments - agents paying for API calls and services using stablecoin micropayments - is the Machine Customer made real. Stripe acknowledges this: "Autonomous agents, themselves, are emerging as a new customer type for internet businesses to sell to." This is precisely the thesis of the AXD Vocabulary's entry on the Machine Customer, published months before Stripe's letter.

Stripe is building the infrastructure that AXD describes. The gap is not in what Stripe is doing. It is in how Stripe frames what it is doing. The Five Levels describe capability. The infrastructure Stripe is building - tokens, protocols, machine payments - is trust architecture. The framing has not yet caught up with the engineering.


An AXD Reading of the Five Levels

If we overlay the twelve AXD frameworks onto Stripe's Five Levels, a richer picture emerges - one that preserves Stripe's capability progression while adding the human experience dimensions that the model omits.

Stripe LevelCapabilityAXD Trust StatePrimary AXD Frameworks
L1: FormsExecution onlyMechanical trustIntent Architecture, Operational Envelope
L2: SearchInterpretationCognitive trustSignal Clarity, Agent Observability
L3: MemoryAccumulationRelational trustTrust Calibration, Consent Horizon
L4: DelegationAutonomyDelegated trustDelegation Design, Failure Architecture, Autonomy Gradient
L5: AnticipationProactionExistential trustAbsent-State Design, Ethical Constraints, Recovery Design

This reading does not replace Stripe's model. It completes it. The capability dimension tells you what the system can do. The trust dimension tells you what the human is prepared to accept. The AXD frameworks tell you how to design the relationship between the two. Without the trust dimension, the Five Levels are a technology roadmap. With it, they become a design framework.


Stripe's Five Levels: Implications for Practitioners

Use the levels, but do not be limited by them. Stripe's Five Levels are a useful shorthand for capability maturity. When communicating with stakeholders, investors, or technical teams, the levels provide a shared vocabulary. But the design work happens in the spaces between the levels - in the trust architecture, the failure paths, the consent mechanisms, and the absent-state experiences that the levels do not address.

Design for trust regression, not just trust progression. Stripe's model implies a forward march from Level 1 to Level 5. Real human-agent relationships move in both directions. A single bad experience at Level 4 can regress trust to Level 1. Design for this. Build recovery mechanisms that allow the human to reduce the agent's autonomy gracefully, without abandoning the relationship entirely.

Separate capability from permission in your architecture. Your system may be capable of Level 4. Your user may only trust Level 2. Design for the gap. The Autonomy Gradient should be a first-class architectural concept - not a settings page, but a dynamic, contextual system that matches agent authority to human trust in real time.

Invest in Level 4 failure architecture before investing in Level 5 capability. The industry is eager to reach Level 5 - anticipatory commerce, no prompts, autonomous purchasing. But Level 5 without robust Level 4 failure architecture is dangerous. If you cannot design graceful failure for delegated commerce, you are not ready for anticipatory commerce. The Failure Architecture framework should be implemented at Level 4 before Level 5 is attempted.

Recognise Stripe's infrastructure as trust architecture. Shared Payment Tokens, the Agentic Commerce Protocol, machine payments - these are trust architecture primitives, even if Stripe does not label them as such. When implementing these tools, frame them through the lens of Trust Architecture and Delegation Design. The token is not just a payment mechanism. It is a trust boundary. The protocol is not just an interoperability standard. It is a consent framework.

Design the notification at Level 5 as if it were the entire product. In anticipatory commerce, the notification - "here is what has been purchased" - is the only touchpoint the human has with the experience. It is not a status update. It is the product. Design it with the care, the detail, the emotional intelligence, and the observability that the moment demands. The notification must answer: What happened? Why? Was it right? What can I change? What will happen next?

Read Stripe's letter as an invitation, not a conclusion. The Five Levels are the beginning of a conversation, not the end of one. Stripe has described the capability trajectory. AXD must describe the experience trajectory. The two are complementary, and neither is complete without the other. The organisations that succeed in agentic commerce will be those that build Stripe's infrastructure and design AXD's trust architecture - not one or the other, but both, together, as a unified discipline of agentic experience design.


Frequently Asked Questions