Structured Data Best Practices
Comprehensive guide to implementing structured data formats that agents can easily parse and understand. Microdata, RDFa, semantic HTML, data table optimisation, and content relationship mapping for the agentic web.
Other Roles
01
Semantic HTML for Agent Parsing
Semantic HTML is the foundation of agent-readable content. Proper use of HTML5 elements - article, section, nav, aside, main, header, footer - provides structural meaning that agents use to understand content hierarchy and purpose. Semantic HTML achieves 45% faster content parsing (Web.dev Performance Study 2024).
Use a single h1 per page that contains your primary topic, followed by a strict heading hierarchy (h2, h3, h4) - agents construct topic outlines from heading structure, and skipped levels break their content model.
Wrap distinct content units in article elements with itemscope and itemtype attributes - agents treat article boundaries as content unit boundaries, enabling precise extraction of individual pieces rather than page-level scraping.
Use time elements with datetime attributes for all dates and timestamps - agents parse ISO 8601 datetime values from the datetime attribute, not from the human-readable text content, so the attribute is the authoritative source.
Implement figure and figcaption elements for all images, charts, and diagrams - agents that cannot process images rely on figcaption text to understand visual content, and the figure element signals that the caption describes the image.
Use dl (description list) elements for key-value pairs, glossary entries, and metadata displays - agents parse description lists as structured key-value data, which is more reliable than extracting pairs from paragraph text.
02
Data Table Optimisation
Tables are one of the richest structured data sources on the web, but most tables are poorly marked up. Agent-optimised tables use proper thead/tbody/tfoot structure, scope attributes, and caption elements to make tabular data machine-readable.
Always include a caption element that describes what the table contains and its data source - agents use captions to determine whether a table is relevant to their query before parsing the full table structure.
Use scope attributes on th elements (scope='col' or scope='row') to explicitly declare header relationships - without scope, agents must infer which headers apply to which cells, and complex tables defeat inference.
Add data-sort-value or data-value attributes to cells that display formatted values - agents parsing a cell that reads '£1.2M' need the raw numeric value (1200000) to perform calculations or comparisons.
Implement responsive table patterns that preserve semantic structure - CSS-only responsive tables that transform rows into cards break the thead/tbody relationship that agents depend on for data extraction.
Provide a machine-readable data source alongside visual tables - link to a CSV, JSON, or API endpoint from the table caption so that agents requiring bulk data access can bypass HTML parsing entirely.
03
Content Relationship Mapping
Isolated content is invisible content. Agents build knowledge graphs from the relationships between your pages, and explicit relationship markup dramatically improves how agents understand your content's place in a broader knowledge structure.
Implement rel='related' links between topically connected pages - agents use related links to expand their understanding of a topic beyond the current page, and explicit relationships are more reliable than inferred ones.
Use Schema.org's isPartOf, hasPart, and isBasedOn properties to declare content hierarchies and derivation chains - agents navigating your content use these relationships to find parent topics, sub-topics, and source material.
Add internal links with descriptive anchor text that names the target content - agents extract link context from anchor text to decide whether to follow a link, and 'click here' provides zero navigational signal.
Implement a topic taxonomy using SKOS (Simple Knowledge Organization System) or Schema.org's DefinedTerm - agents that understand your taxonomy can map your content to their own knowledge structures more accurately.
Publish a site-level content graph (as JSON-LD or RDF) that declares all pages and their relationships - this provides agents with a complete map of your content without requiring them to crawl and infer the structure.
04
Microdata and RDFa Implementation
While JSON-LD is the preferred format for new implementations, microdata and RDFa embed structured data directly in HTML elements. This approach is valuable when your content management system makes JSON-LD injection difficult or when you need structured data tightly coupled to visible content.
Use microdata (itemscope, itemtype, itemprop) when your CMS templates generate HTML directly and JSON-LD injection requires custom development - microdata attributes can be added to existing HTML elements without additional script blocks.
Implement RDFa (typeof, property, resource) for content that benefits from inline annotation - RDFa is particularly effective for marking up author names, dates, and entity references within paragraph text.
Avoid mixing microdata and RDFa on the same page - agents that encounter both formats may produce conflicting structured data extractions, and the interaction between the two formats is not well-defined.
Test microdata and RDFa output with Google's Structured Data Testing Tool and the W3C RDFa Distiller - these tools show you exactly what structured data agents will extract from your markup, revealing gaps and errors.
Consider migrating from microdata/RDFa to JSON-LD as a long-term strategy - JSON-LD is easier to maintain, easier to test, and better supported by modern agent frameworks, but the migration should be incremental to avoid disruption.
Related Reading
Go Deeper
Explore the essays and frameworks that underpin this guide.
Observatory Essays
Signal Clarity
How structured data creates the signal clarity that agents use for discovery and trust.
Agentic Entity Resolution
Entity resolution through structured data - how agents identify and link entities.
The Intelligence Layer
Structured data as the intelligence substrate that agents reason over.