AXD Metrics Standard - precision measurement instruments representing the seven KPIs of agentic experience design

Measurement Standard · Version 1.0 · April 2026

The AXD Metrics Standard

Seven KPIs that measure what matters in agentic experience design - from discovery visibility and agent-assisted conversion to trust erosion and absent-state outcome quality.

Why this standard exists

You cannot improve what you cannot measure

The AXD Metrics Standard defines seven Key Performance Indicators that span the full lifecycle of agentic experience - from merchant-side discovery and conversion through to agent-side delegation, trust, and absent-state quality. Each KPI includes a precise formula, benchmark tiers, diagnostic signals, and mapping to the AXD Practice frameworks.

The first three KPIs (AIR, AACR, CSAS) measure the merchant-side of agentic commerce: whether your business is visible to agents, whether agent traffic converts, and whether you can attribute transactions to specific AI surfaces. The remaining four KPIs (DCR, TEI, IFR, ASOS) measure the agent-side: whether delegated tasks complete, whether trust is maintained, whether interrupts are calibrated, and whether absent-state outcomes meet human intent.

Overview

The seven KPIs at a glance

#KPIDirection
01Assistant Inclusion Rate(AIR)Higher ↑
02Agent-Assisted Conversion Rate(AACR)Higher ↑
03Cross-Surface Attribution Score(CSAS)Higher ↑
04Delegation Completion Rate(DCR)Higher ↑
05Trust Erosion Index(TEI)Lower ↓
06Interrupt Frequency Ratio(IFR)Calibrated ⟷
07Absent-State Outcome Score(ASOS)Higher ↑
01

Discovery phase · Merchant-side

Assistant Inclusion Rate

AIR

The percentage of monitored AI assistant queries in your product category where your brand, product, or service is included in the agent's recommendation set. AIR measures whether your business is visible to the agentic layer - whether, when an agent is asked about your category, you appear in the answer.

Formula

Agent queries including your brand/product as a recommendation
Total relevant agent queries monitored

× 100 = AIR %

Minimum 100 queries per measurement period across at least three AI surfaces (ChatGPT, Perplexity, Gemini, Copilot, etc.).

How to measure

Select 100+ representative queries across your product categories. Run them against a minimum of three AI assistant surfaces at regular intervals (weekly or fortnightly). Record whether your brand appears in the recommendation set for each query.

AIR is not a single number - it varies significantly across AI surfaces. Report per-surface and aggregate. Track longitudinally to identify trends.

Benchmark tiers

Poor

<5%

Invisible to the agentic layer. Agents do not include your brand in recommendations. Structured data, entity authority, and content freshness all require immediate attention.

Developing

5–25%

Intermittent visibility. Appearing in some queries but not consistently. Likely present in one AI surface but absent from others. Schema and content gaps are the probable cause.

Proficient

25–60%

Consistent presence across multiple AI surfaces. Structured data is comprehensive and regularly updated. Entity authority is established in your primary categories.

Exemplary

>60%

Dominant agentic visibility. Your brand is a default recommendation in your category. Structured data is comprehensive, fresh, and semantically rich. Entity graphs are well-connected.

Raises AIR

Complete JSON-LD product markup, product data coverage >90%, established entity authority in knowledge graphs, regular content publication with structured data, multi-surface optimisation strategy.

Watch for

AIR varies significantly across AI surfaces. A brand may score 40% on ChatGPT but 5% on Perplexity. Always measure per-surface and investigate the delta - it reveals which surfaces your structured data strategy is reaching.

Reduces AIR

Inconsistent entity naming across platforms, missing or incomplete Schema.org markup, absence from major product directories and knowledge bases, stale content with outdated product information.

02

Transaction phase · Merchant-side

Agent-Assisted Conversion Rate

AACR

The percentage of agent-referred sessions that result in a completed transaction. AACR measures whether your commerce infrastructure can convert agent-driven traffic into revenue. It is the agentic equivalent of e-commerce conversion rate, but applied specifically to sessions where an AI agent has referred, recommended, or directly facilitated the purchase.

Formula

Transactions completed via agent referral or agent-assisted checkout
Total agent-referred sessions

× 100 = AACR %

Segment by AI surface (ChatGPT, Perplexity, Copilot, etc.) to identify surface-specific conversion gaps.

How to measure

Identify agent-referred sessions through UTM parameters, referrer headers, or API-based attribution. Track the full session from agent referral through to transaction completion.

AACR requires attribution infrastructure (see CSAS). Without reliable surface identification, AACR cannot be accurately segmented.

Benchmark tiers

Poor

<0.5%

Agent traffic is arriving but not converting. Checkout infrastructure is likely incompatible with agent-mediated sessions. Human authentication steps, CAPTCHA barriers, or session-based pricing are probable blockers.

Developing

0.5–2%

Some agent-assisted conversions occurring, likely through traditional web checkout rather than protocol-based transactions. Conversion path exists but is not optimised for agent mediation.

Proficient

2–5%

Agent-assisted checkout is functional. Protocol integration (ACP, UCP) is enabling direct agent transactions. Conversion rates are approaching traditional e-commerce benchmarks for the category.

Exemplary

>5%

Agent-native checkout infrastructure is fully operational. Agents can complete transactions without browser-based checkout. Real-time inventory, dynamic pricing, and payment tokenisation are all agent-accessible.

Raises AACR

ACP/UCP protocol integration, real-time inventory API availability, agent-compatible payment tokenisation, browserless checkout capability, structured product data with pricing and availability.

Watch for

AACR is the diagnostic bridge between AIR and revenue. High AIR with low AACR means agents are recommending you but your checkout infrastructure cannot convert agent-mediated sessions. This is the most common pattern in early agentic commerce.

Reduces AACR

Human-only authentication steps in checkout, CAPTCHA or bot-detection blocking agent sessions, inconsistent pricing between API and storefront, stale inventory data, absence from agent payment sandboxes.

03

Attribution phase · Merchant-side

Cross-Surface Attribution Score

CSAS

The percentage of agent-assisted transactions for which you can reliably identify which AI surface (ChatGPT, Perplexity, Gemini, Copilot, etc.) originated or influenced the purchase. CSAS measures your attribution infrastructure's ability to track the agentic customer journey across multiple AI surfaces.

Formula

Agent-assisted transactions with verified surface attribution
Total agent-assisted transactions

× 100 = CSAS %

Unattributed agent transactions default to 'direct' or 'dark traffic' in most analytics platforms, making them invisible to marketing attribution.

How to measure

Implement server-side event collection with surface-specific UTM parameters. Use referrer header analysis, API-based attribution, and first-party cookie strategies to identify the originating AI surface.

CSAS requires investment in attribution infrastructure before agent traffic volumes grow. Retrofitting attribution is significantly harder than building it from the start.

Benchmark tiers

Poor

<20%

Most agent-assisted transactions are unattributed. Marketing spend cannot be allocated to agentic channels. Agent ROI is unmeasurable. Attribution infrastructure needs immediate investment.

Developing

20–50%

Partial attribution in place. Some AI surfaces are identifiable (typically those with clear referrer headers) but others remain dark. Server-side collection is incomplete.

Proficient

50–80%

Majority of agent-assisted transactions are attributed to specific AI surfaces. Server-side event collection is operational. Marketing can allocate spend to agentic channels with reasonable confidence.

Exemplary

>80%

Comprehensive cross-surface attribution. Agent journey mapping is operational across all major AI surfaces. Marketing attribution models include agentic channels as first-class touchpoints.

Raises CSAS

Server-side event collection, surface-specific UTM parameter strategies, first-party cookie attribution, API-based referral tracking, dedicated agentic channel taxonomy in analytics platform.

Watch for

CSAS below 20% is normal in early implementation. The goal at this stage is to have measurement infrastructure in place before agent traffic scales. Prioritise coverage breadth over attribution precision.

Reduces CSAS

Client-side-only analytics (blocked by agent sessions), missing referrer header capture, generic UTM parameters that don't distinguish AI surfaces, reliance on last-click attribution models.

04

Delegation phase · Agent-side

Delegation Completion Rate

DCR

The percentage of delegated tasks that complete to their stated outcome without human abandonment, override, or unrecoverable failure. DCR measures the end-to-end reliability of the delegation lifecycle - from intent specification through execution to outcome delivery.

Formula

Delegated tasks completing to stated outcome without abandonment or override
Total delegated tasks initiated

× 100 = DCR %

Distinguish between user-initiated abandonment (trust failure), system-initiated failure (capability failure), and human override (calibration failure). Each has different design implications.

How to measure

Log all task lifecycle events: delegation initiation, constraint specification, execution milestones, completion confirmation, and any abandonment or override events. Calculate DCR from the ratio of clean completions to total initiations.

Begin with a single agent task type to establish baseline measurement methodology before expanding to the full task portfolio.

Benchmark tiers

Poor

<40%

Most delegated tasks fail to complete. Users are abandoning or overriding agent actions frequently. Intent specification, constraint encoding, or execution reliability are fundamentally broken.

Developing

40–65%

Completion rates variable by task complexity. Simple, well-bounded tasks completing reliably; complex multi-step delegations generating failures. Constraint encoding and exception handling need attention.

Proficient

65–85%

Reliable completion across standard task portfolio. Failures are concentrated in edge cases and novel task types. Progressive delegation is working - users are expanding scope based on demonstrated reliability.

Exemplary

>85%

High-fidelity delegation lifecycle. Failures are rare, well-handled, and informing ongoing improvement. Users trust the agent with increasingly complex and consequential tasks.

Raises DCR

Plan Preview before execution, ambiguity negotiation at delegation time, explicit constraint encoding, progressive delegation (simple tasks first), well-designed exception handling and recovery paths.

Watch for

DCR can be artificially inflated by scope narrowing - if the agent only accepts tasks it knows it can complete, DCR rises but user value falls. Monitor task acceptance rate alongside DCR.

Reduces DCR

Underspecified intent at delegation, missing constraint encoding, absent exception handling, silent failure modes where the agent fails without notification, and no recovery path for partial completions.

05

Relationship phase · Agent-side

Trust Erosion Index

TEI

The percentage of active agent users who reduce their agent's autonomy level, revoke previously granted permissions, or disengage from agent interaction within a 30-day measurement window. TEI is a lagging indicator - it measures the consequence of trust failures that have already occurred.

Formula

Users who reduced agent autonomy, revoked permissions, or disengaged within 30 days
Total active agent users in the measurement cohort

× 100 = TEI %

Lower is better. Measure at 30, 60, and 90-day intervals to distinguish between acute trust events and gradual erosion patterns.

How to measure

Track autonomy level changes, permission revocations, and engagement frequency for each user over rolling 30-day windows. A user who reduces autonomy from 'full' to 'supervised' or revokes a previously granted permission counts as a trust erosion event.

TEI requires a defined autonomy model with measurable levels. Without explicit autonomy tiers, trust erosion cannot be quantified - only inferred from engagement decline.

Benchmark tiers

Poor

>40%

Severe trust erosion. Nearly half of users are reducing agent autonomy or disengaging. Likely caused by a systemic failure: silent errors, actions outside mandate, or poor Explainability. Immediate investigation required.

Developing

20–40%

Significant trust erosion concentrated in specific user segments or task types. Pattern analysis needed to identify whether the cause is onboarding failure, capability mismatch, or communication breakdown.

Proficient

5–20%

Moderate trust erosion within expected range. Some users are recalibrating autonomy levels based on experience - this is healthy. Investigate cases where erosion leads to full disengagement.

Exemplary

<5%

Minimal trust erosion. Users are maintaining or increasing agent autonomy over time. Trust calibration is working. Relationship temporality is positive.

Reduces TEI

Accurate capability representation during onboarding, Plan Preview before consequential actions, proactive communication when operating near constraint boundaries, well-designed Failure Architecture with honest error reporting.

Watch for

TEI of 0% is not necessarily ideal - it may indicate users are not engaged enough to form trust expectations. Some trust recalibration is healthy. Focus on preventing erosion that leads to permanent disengagement.

Increases TEI

Silent failures (agent fails without notification), actions outside stated mandate, poor Explainability (user cannot understand why agent acted), overpromising during onboarding, and missing recovery paths after errors.

06

Active operation · Agent-side

Interrupt Frequency Ratio

IFR

The number of agent-initiated human interrupts per 100 autonomous actions completed. IFR measures whether the agent is calibrating its interrupt behaviour appropriately - asking for human input when genuinely needed, and operating autonomously when confidence is justified.

Formula

Agent-initiated human interrupts
Total autonomous actions completed

× 100 = IFR

Segment by action consequence level. The target is not zero - the target is appropriate calibration. High-consequence actions should have higher IFR than routine operations.

How to measure

Log every agent-initiated interrupt (confirmation request, clarification query, escalation) and every autonomous action completed without interrupt. Calculate the ratio per 100 actions.

Segment IFR by action consequence level (low, medium, high). A well-calibrated agent should show higher IFR for high-consequence actions and lower IFR for routine operations.

Benchmark tiers

Over-interrupting

>25

Agent is interrupting too frequently. Users experience 'confirmation fatigue' and begin ignoring or auto-approving interrupts - defeating their purpose. Autonomy confidence thresholds need recalibration.

Developing

10–25

Interrupt frequency is high but not yet causing fatigue. Agent is likely using a conservative interrupt strategy appropriate for early deployment. Monitor for user bypass behaviour.

Proficient

3–10

Interrupt frequency appropriate to action consequence. Human attention directed to genuinely uncertain or high-consequence decisions.

Exemplary

<3

Agent operating with high autonomy confidence. Interrupts are rare, well-timed, and almost never rejected by human reviewers. Relationship history is informing calibration.

Optimises IFR

Consequence-weighted interrupt thresholds, accumulated relationship history informing confidence, well-designed interrupt UX that respects human attention, progressive autonomy expansion based on demonstrated reliability.

Watch for

IFR must be interpreted alongside TEI. Low IFR with high TEI means the agent is not interrupting enough - it is making autonomous decisions that erode trust. High IFR with low TEI means the agent is over-cautious but maintaining trust.

Degrades IFR

Flat interrupt thresholds that don't account for consequence level, missing relationship history, no learning from previous interrupt outcomes, interrupt UX that is disruptive or poorly timed.

07

Execution phase · Agent-side

Absent-State Outcome Score

ASOS

The percentage of autonomous actions completed while the human principal is absent that achieve their intended outcome without requiring human correction after the fact. ASOS is the master metric of the AXD discipline: it operationalises the Third Founding Principle - Absence is the Primary Use State - as a measurable, improvable performance standard. It asks, simply: how well does the agent do when no one is watching?

Formula

Absent-state actions achieving intended outcome without post-execution correction
Total absent-state autonomous actions completed

× 100 = ASOS %

'Correction' includes: user reversal of agent action, user complaint about agent outcome, human override of a completed action, or outcome diverging from stated mandate.

How to measure

Identify absent-state periods through session inactivity signals, explicit 'I'm away' delegation modes, or scheduled autonomous operation windows. Track every action taken in these periods and assess outcome quality at the return state.

ASOS requires a defined outcome specification for each delegation - the intended result must be recorded at delegation time so it can be evaluated at execution time. This is why Intent Architecture is the first AXD framework: without a stated outcome, ASOS cannot be measured.

Benchmark tiers

Poor

<60%

Most absent-state actions require human correction. Agent is not ready for autonomous operation. Absent-state design is not functioning. Immediate operational review required.

Developing

60–80%

Absent-state quality variable by task type. Simple, bounded tasks performing well; complex multi-step tasks generating corrections. Constraint encoding and failure recovery need attention.

Proficient

80–92%

Absent-state operation reliable across standard task portfolio. User confidence in autonomous operation is justified. Residual corrections are informing ongoing calibration.

Exemplary

>92%

Agent operating with high fidelity to human intent across absent-state periods. Accumulated memory and context are improving ASOS over time. Relationships have genuine temporality.

Raises ASOS

Precise outcome specification at delegation time, complete constraint encoding, well-designed return-state narratives, Failure Architecture that prevents silent errors, accumulated agent memory improving decision quality over time.

Watch for

ASOS and DCR together tell the full story. High DCR with low ASOS means tasks are completing but producing wrong outcomes - a goal specification failure. High ASOS with low DCR means quality is high but scope is too narrow.

Reduces ASOS

Underspecified outcome criteria at delegation, missing constraint encoding, absent memory and context continuity across sessions, silent failure modes, and Absent-State Audit methodology not applied in testing.

Deployment roadmap

Implementation sequence

Not all seven KPIs should be implemented simultaneously. The sequence below reflects both logical dependency and organisational readiness. Teams in early agentic deployment should begin with the merchant-side KPIs (01–03) before instrumenting the agent relationship KPIs (04–07).

1

Establish discovery baseline

Before anything else, measure whether you are visible. Run your first AIR test panel using a minimum of 100 representative queries across three AI surfaces. This establishes the discovery baseline against which all subsequent work is measured.

AIR
2

Stand up attribution infrastructure

Implement server-side event collection and surface-specific UTM parameters before agent traffic volumes grow. Attribution infrastructure is far harder to retrofit than to build from the start. CSAS below 20% at this stage is normal; the goal is to have measurement in place before traffic scales.

CSAS
3

Diagnose the checkout gap

Once AIR and CSAS are instrumented, measure AACR to identify whether agent traffic is converting. In most organisations at this stage, AACR will be significantly below traditional e-commerce conversion rates. The delta reveals the checkout infrastructure gap and defines the protocol integration roadmap.

AACRCSAS
4

Instrument delegation quality

Once active agentic systems are in production, instrument DCR by logging all task lifecycle events. Begin with a single agent task type to establish baseline measurement methodology before expanding to the full task portfolio.

DCR
5

Monitor trust and autonomy dynamics

Implement TEI and IFR measurement as a pair. TEI without IFR cannot distinguish between trust erosion caused by excessive interruption and trust erosion caused by excessive autonomy. Both KPIs are required for accurate diagnosis.

TEIIFR
6

Assess absent-state quality

ASOS is the capstone measurement. It requires intent specification infrastructure (step 4), functional absent-state operation, and a return-state assessment methodology. Implement ASOS measurement once the preceding five KPIs are established.

ASOS