title: “The Agent-Native Web: Declarative Interaction Contracts for AI Agents over HTTP” subtitle: “A Matter of Interfaces: Toward an Agent-Native Layer for the Web” author: “Sergio Muñoz Gamarra” version: “0.1” date: “2026-05-09” canonical_url: “https://sergiomunozgamarra.github.io/iacp” license: “CC BY-NC-ND 4.0”
© 2026 Sergio Muñoz Gamarra. This work is licensed under CC BY-NC-ND 4.0.
You may share it with attribution for non-commercial purposes, but you may not modify it or use it commercially without explicit written permission.
The web is changing. Not because HTTP is obsolete, and not because human browsing will disappear, but because a new kind of user is here: the AI agent. We are moving toward the idea of an “agent for everything”: a system that can search, compare, plan, book, buy, monitor, fill forms, and execute workflows on behalf of people and organizations. The promise is strong, but current web interfaces make reliable execution hard. Most websites are still designed for humans looking at screens, not for agents that need clear capabilities, constraints, permissions, and consequences.
Today, many agents must act like humans inside a browser. They click buttons, inspect pages, parse DOM structures, process screenshots, handle cookie banners, wait for JavaScript, and recover from UI changes. This can work, but it is costly and brittle: it increases token usage, adds latency, depends on unstable layouts, and blurs security boundaries. Recent web-agent benchmarks also show that many online tasks remain difficult, and that API-based or hybrid approaches often outperform pure browsing agents in realistic settings.
This paper argues that what is missing is not a replacement for HTTP, but an agent-native layer on top of HTTP. Websites should be able to declare, in a standard machine-readable way, what agents can read, query, compare, prepare, and execute safely. To do this, we propose Agent Interaction Contracts: declarative HTTP-native manifests that expose capabilities, input and output schemas, authentication requirements, authorization scopes, rate limits, usage policies, action risk levels, provenance metadata, and human-confirmation requirements. Because agents may operate over personal, sensitive, or regulated data, these contracts should also expose privacy-relevant metadata such as data categories, processing purpose, consent requirements, retention, downstream-use restrictions, and third-party sharing.
Agent Interaction Contracts are meant to complement, not replace, existing standards such as OpenAPI, robots.txt, llms.txt, OAuth, and the Model Context Protocol. OpenAPI describes APIs, robots.txt expresses crawler preferences, llms.txt helps models consume content, OAuth supports delegated authorization, and MCP connects models with tools. But none of them alone provides a lightweight, website-level contract for agentic interaction.
We present the motivation, design principles, discovery mechanism, capability taxonomy, security model, and response structure of this layer. We also outline an evaluation methodology that compares agent-native contracts with browser-based and API-based approaches in terms of token cost, task success, latency, interaction steps, and unsafe-action rate. Our central claim is simple: the “agent for everything” will not be achieved only by making agents better at using human interfaces. We must also make the web itself more explicit, auditable, and ready for machine-mediated interaction.
| Term | Meaning |
|---|---|
| AICP | The proposed protocol/convention for declaring agent-facing website contracts |
| Agent Interaction Contract | The manifest exposed by a website to describe capabilities, policies, risks, and privacy metadata |
| AOM | Agent Object Model, the runtime response structure for agent-facing capability calls |
| Capability | An agent-facing operation exposed by a website |
| Agent runtime | The system that interprets contracts, plans actions, and invokes capabilities |
| Agent browser | A user agent for AI systems that manages discovery, credentials, permissions, confirmations, and fallback browsing |
The web was built for human browsing. HTTP gives us a common way to exchange resources, and browsers give us a universal interface to consume them. This model has been extremely successful. But now a new user is emerging: the AI agent.
AI agents are expected to search, compare, monitor, plan, fill forms, book, buy, and execute workflows across websites on behalf of users. This is the promise of the “agent for everything”. In practice, that promise is still hard to deliver reliably, because most websites expose human-facing pages rather than agent-facing capabilities.
As a result, many agents must behave like humans inside a browser. They inspect HTML, parse DOM structures, process screenshots, click buttons, wait for JavaScript, handle cookie banners, and recover from UI changes. This can work, but it is expensive and fragile. It increases token consumption, latency, implementation complexity, and security risk.
There is also a scaling issue: token budgets are not infinite. Cost, availability, and latency are becoming strategic constraints for production systems. Reducing unnecessary token use is no longer just optimization; it is becoming a core requirement for scalable agentic infrastructure.
The problem is not HTTP itself. HTTP already provides extensible semantics through methods, headers, status codes, representations, and URI-based resources. The problem is that websites rarely publish explicit machine-readable contracts describing what agents can safely read, query, compare, prepare, or execute.
This paper proposes Agent Interaction Contracts: declarative, HTTP-native manifests through which websites expose capabilities, input and output schemas, authentication requirements, authorization scopes, rate limits, usage policies, action risk levels, provenance metadata, privacy metadata, and human-confirmation requirements.
The proposal complements existing standards such as OpenAPI, robots.txt, llms.txt, OAuth, and MCP. OpenAPI describes APIs, robots.txt expresses crawler preferences, llms.txt helps models consume content, OAuth enables delegated authorization, and MCP connects models with tools. Agent Interaction Contracts target a different gap: a lightweight, website-level contract for agentic interaction.
The key idea is straightforward: agents should not need to infer a website’s capabilities from visual interfaces when the website can declare them explicitly.
This paper makes four contributions. First, it defines the interface mismatch between human-oriented browsing and agent-oriented interaction. Second, it introduces Agent Interaction Contracts as an HTTP-native abstraction for exposing website capabilities. Third, it proposes a capability taxonomy and a security model for agentic web actions. Fourth, it outlines an evaluation methodology comparing this approach with browser-based and API-based agents across token cost, task success, latency, interaction steps, and unsafe-action rate.
The web does not need to stop being human-readable. But it must become agent-readable as well.
The need for an agent-native web interface does not appear in isolation. The web already has several mechanisms for machine-readable access, API description, authorization, structured data, and tool integration. The problem is that these mechanisms solve adjacent problems, but not exactly the problem of safe and efficient agentic interaction with ordinary websites.
HTTP should not be replaced in order to support AI agents. It already provides a flexible model based on resources, methods, headers, status codes, representations, caching, and content negotiation. This makes HTTP a good substrate for an agent-native layer.
The issue is not the transport protocol. The issue is the lack of explicit interaction contracts. Most websites expose pages and visual workflows, but they do not declare, in a standard way, which capabilities are available to agents, how these capabilities should be invoked, what permissions are required, or what consequences an action may have.
The web already contains several partial solutions.
OpenAPI describes HTTP APIs in a structured way. It is useful for developers and can also help agents understand endpoints. However, OpenAPI is not normally exposed as a universal website-level agent interface, and it does not focus on usage policies, action risk, human confirmation, provenance, or agent-specific discovery.
robots.txt expresses crawler preferences. It is simple and widely understood, but it is not an authorization system and it does not describe capabilities. It can tell a crawler where it should not go, but it cannot tell an agent how to safely search flights, compare products, or prepare a booking.
sitemaps help machines discover URLs. They are useful for indexing, but they describe locations, not interactions.
schema.org and structured data help websites describe entities such as products, articles, organizations, events, and reviews. This is valuable, but it is mainly about the meaning of content, not about how an agent should execute workflows or respect action boundaries.
llms.txt is an emerging convention to make website content easier for language models to consume. It is important because it recognizes that LLMs need more direct access to relevant information. However, it is mostly content-oriented. It does not define transactional actions, authentication scopes, rate limits, risk levels, or confirmation requirements.
A common answer to this problem is: agents should just use APIs. In many cases, this is true. APIs are more stable and efficient than browser automation. But as a general answer for the public web, this is not enough. Many APIs are private, undocumented, inconsistent, partner-only, or disconnected from the public website experience. Also, APIs are designed mainly for developers, not necessarily for autonomous agents acting on behalf of users.
An agent does not only need to know that an endpoint exists. It also needs to know what the endpoint means in a user workflow, whether the action is reversible, what permissions are required, what rate limits apply, whether the result can be reused, and whether human confirmation is required before continuing. This is why an agent-native layer should not be only an API description. It should be an interaction contract.
The Model Context Protocol addresses an important part of the agent ecosystem: connecting models with tools, data sources, and external systems. It is useful for controlled environments, enterprise integrations, development tools, databases, and custom workflows.
However, MCP is tool-centric. A public website is resource-centric. Requiring every website to create, deploy, and maintain a custom MCP server may be too heavy as a universal web mechanism. In many cases, a website should be able to expose agent-consumable capabilities directly over HTTP, using the backend and routes it already has.
In this sense, Agent Interaction Contracts are not a replacement for MCP. They are complementary. MCP can connect agents to tools. Agent Interaction Contracts can help ordinary websites describe themselves as safe, discoverable, policy-aware interaction surfaces.
Agentic web interaction also needs a clear authorization model. When an agent acts on behalf of a user, the website must know what the user has delegated, what the agent is allowed to do, and where the boundary is between reading, preparing, and committing an action.
OAuth already provides a strong foundation for delegated authorization. But OAuth alone does not describe the semantics of agentic actions. It can say that a token has a scope, but it does not define a common taxonomy for low-risk queries, medium-risk preparatory actions, high-risk purchases, or destructive operations.
For this reason, an agent-native contract should build on existing authorization systems, not replace them. It should make permissions more understandable for agents and users by connecting scopes with declared capabilities and risk levels.
Recent AI systems show that agents can operate graphical interfaces. This is impressive and useful, especially when no better interface exists. But using a browser as the default machine interface is not ideal.
Browser automation forces agents to infer intent from presentation. It also makes them vulnerable to interface changes, hidden state, misleading content, modals, CAPTCHAs, dynamic JavaScript, and prompt injection attacks embedded in webpages.
This does not mean browser agents are useless. They are necessary as a fallback. But fallback should not become the main architecture of the agentic web.
Each existing mechanism solves one part of the problem:
| Mechanism | Main purpose | Main limitation for agents |
|---|---|---|
| HTTP | Resource exchange | Does not declare agent capabilities |
| OpenAPI | API description | Not a full agent interaction contract |
| robots.txt | Crawler preferences | Not authorization; no actions |
| sitemap | URL discovery | No workflow semantics |
| schema.org | Structured entities | No interaction model |
| llms.txt | LLM-readable content | Mostly content-oriented |
| OAuth | Delegated authorization | No action taxonomy |
| MCP | Tool integration | May be too heavy per website |
| Browser automation | Universal fallback | Expensive and fragile |
The gap is therefore clear. The web has pages for humans, APIs for developers, and tool protocols for controlled integrations. But it does not yet have a lightweight, standard, website-level contract for AI agents.
This is the gap that Agent Interaction Contracts aim to fill.
AI agents are starting to use the web as an operational environment. They do not only retrieve documents. They compare alternatives, monitor changes, fill forms, prepare actions, and sometimes execute workflows on behalf of users. However, the current web does not expose a clear interaction model for this kind of use.
The result is a mismatch between what websites provide and what agents need.
Most websites are designed to guide human attention. They use layout, hierarchy, color, buttons, menus, modals, animations, pagination, filters, and progressive disclosure. These elements are useful for people, but they are not the most efficient interface for agents.
An agent does not primarily need visual presentation. It needs to know:
When this information is not declared explicitly, the agent has to infer it from the page. This inference is expensive, fragile, and sometimes wrong.
Browser automation is powerful because it works even when no API or machine-readable interface exists. But it should be understood as a fallback, not as the ideal architecture.
A browser-based agent must often:
This consumes tokens, time, and engineering effort. It also introduces operational fragility: a small UI change can break an agentic workflow.
The cost of agentic browsing is not only technical. It is also economic.
As AI systems become more common, token consumption becomes a scarce resource. Models are more capable, but agentic workflows can require long context windows, repeated observations, intermediate reasoning, tool calls, retries, and safety checks. In practice, this creates a form of token rationing: systems must decide where tokens are really necessary and where they are being wasted.
Using tokens to parse irrelevant markup, visual structure, duplicated navigation, cookie text, advertisements, and unstable page elements is not sustainable at scale. For this reason, token efficiency is not just an optimization. It is a requirement for scalable agentic systems.
An agent-native interface should reduce the amount of unnecessary context that agents need to process. Instead of reading a full page to infer that a flight search capability exists, the agent should be able to discover the capability directly.
HTML is excellent for presenting documents and interfaces. It can expose links, forms, labels, metadata, and structured elements. But HTML does not reliably express the business-level semantics that agents need.
For example, a page may contain several buttons:
A human can usually understand the difference from context. An agent may need to infer whether a button is low-risk, reversible, financially binding, destructive, or merely navigational.
This is not only a usability problem. It is a safety problem.
A website should be able to declare that one operation is a read-only query, another is a preparatory action, another requires explicit user confirmation, and another is a high-risk irreversible action. These semantics should not depend only on visual interpretation.
APIs are a better interface for agents than visual pages, but they do not solve the problem completely.
Many APIs are:
Even when an API exists, the agent still needs to understand how endpoints map to user intentions and real-world consequences. A normal API description may explain parameters and responses, but it may not declare risk level, confirmation requirements, usage policies, provenance, freshness, or safe fallback behavior.
The problem is therefore not only access to endpoints. The problem is the lack of an interaction contract.
When an agent browses a website like a human, the boundary between reading, preparing, and executing can become ambiguous.
This creates several risks:
Agentic systems need explicit safety boundaries. A read-only query, a reversible preparatory action, a financial transaction, and a destructive operation should not be treated as equivalent interactions.
The current web forces AI agents to infer capabilities, constraints, permissions, and risks from interfaces designed for humans.
This paper argues that this inference should become explicit.
Websites should declare their agent-facing capabilities through standard, machine-readable, HTTP-native interaction contracts.
Agent Interaction Contracts should not try to reinvent the web. They should add a missing layer to the web that already exists. For this reason, the proposal must be simple enough to be adopted by ordinary websites, but expressive enough to support real agentic workflows.
The proposal should be built on top of HTTP, not as a replacement for it.
HTTP already provides resources, methods, headers, status codes, representations, caching, authentication mechanisms, and content negotiation. Agent Interaction Contracts should use these existing mechanisms instead of creating a parallel transport system.
The goal is not a new internet for agents. The goal is an agent-readable layer for the current internet.
An agent should be able to discover whether a website exposes an agent-native interface without guessing, scraping, or relying on external registries.
A simple discovery mechanism could be:
GET /.well-known/agent-interface
or an HTTP Link header:
Link: </.well-known/agent-interface>; rel="agent-interface"
The important point is that discovery must be predictable. If every website exposes its agent interface in a different place, the standard loses much of its value.
Websites should declare capabilities explicitly.
An agent should not need to inspect a visual page to infer that a website supports flight search, product comparison, booking holds, subscription cancellation, invoice download, or support ticket creation.
The contract should describe:
The contract should reduce unnecessary token consumption.
Agents should not spend tokens parsing navigation menus, advertisements, cookie banners, duplicated layout, visual instructions, or irrelevant markup when the task only requires a small set of structured capabilities and results.
Token efficiency is important for cost, latency, scalability, and reliability. As agentic systems become more common, token usage will become a design constraint, not only a billing detail.
The protocol must treat security as a first-class design goal.
Agentic interaction is different from passive crawling. Agents may act on behalf of users, operate across services, and execute workflows with financial, legal, operational, or privacy consequences.
For this reason, contracts should support:
Security cannot be an optional appendix. It must be part of the contract.
Websites need control over how agents consume and use their resources.
A contract should express policies such as:
This is important because agentic access should not become a more sophisticated form of uncontrolled scraping. The standard should give websites a way to support agents while preserving control over usage.
Reading is not the same as acting.
An agent interface must distinguish between different kinds of interactions:
A flight search is not the same as buying a ticket. Preparing a booking hold is not the same as confirming payment. Downloading an invoice is not the same as cancelling an account.
The contract should make these differences explicit, because agents and users need to know when an action is safe, reversible, risky, or final.
Agent Interaction Contracts should coexist with the current web.
Human-facing pages should continue to work. Existing APIs should continue to work. OpenAPI, robots.txt, sitemaps, structured data, OAuth, llms.txt, and MCP should remain useful.
The purpose is not to replace all previous mechanisms, but to connect them into a clearer agent-facing layer.
If adoption requires a large engineering project, most websites will not implement it.
The standard should be easy to generate from existing backend structures:
Frameworks should be able to expose a first version automatically, and developers should be able to refine it manually where needed.
Agentic interactions should be traceable.
When an agent performs a task, it should be possible to understand:
This matters for debugging, compliance, accountability, and user trust.
The first version should be small.
A standard that tries to solve every possible interaction from the beginning will probably fail. The first version should define only the essential elements: discovery, capabilities, schemas, policies, authentication, risk levels, and provenance.
At the same time, it should be extensible enough to support more advanced use cases later, such as subscriptions, events, payments, negotiation, reputation, pricing, and agent identity.
The design principle is simple: start minimal, but do not close the door to the real web.
Agent Interaction Contracts should support privacy-preserving interaction by design.
Agents should not receive more personal data than necessary to complete a task. A contract should declare which data categories are required, which are optional, which are forbidden, why the data is needed, how long it may be retained, and whether it may be shared with third parties.
This is important because agentic workflows may involve personal accounts, payments, invoices, health portals, employment systems, travel documents, banking systems, and other sensitive contexts. Token efficiency and privacy are connected: the less irrelevant context an agent needs to process, the less unnecessary personal data enters the agent runtime.
An Agent Interaction Contract is the core element of the proposed agent-native web layer. It is a machine-readable declaration, exposed by a website over HTTP, that describes how AI agents can interact with the site in a safe, efficient, and policy-aware way.
The purpose of the contract is not only to describe endpoints. It is to describe interaction. An agent should be able to understand what the website allows, what it requires, what it returns, what it forbids, and which actions may have real consequences.
An Agent Interaction Contract can be defined as:
A machine-readable declaration, exposed over HTTP, that describes the capabilities a website makes available to AI agents, including how to invoke them, what inputs and outputs they accept, what policies govern their use, what authentication is required, what risks actions carry, and how results should be attributed.
This definition is intentionally broader than a traditional API description. APIs describe how to call endpoints. Agent Interaction Contracts describe how an agent can participate in a website workflow.
In this sense, the contract is not only technical. It is also operational and semantic.
A contract should include the minimum information required for an agent to interact with a website without guessing from the visual interface.
At minimum, it should describe:
The reference representation of an Agent Interaction Contract should be a manifest format, not only a data exchange format. For this reason, this paper proposes TOML as the canonical representation for static contract files.
TOML is appropriate because Agent Interaction Contracts are closer to configuration manifests than to transactional API payloads. They are intended to be read by machines, but also reviewed, edited, versioned, and discussed by developers. Compared with YAML, TOML is more constrained and less ambiguous. Compared with JSON, it is easier to read and maintain manually.
A website may expose the contract at:
GET /.well-known/agent-interface.toml
or through content negotiation:
Accept: application/aicp+toml
JSON should still be supported as an equivalent representation for clients and systems that prefer strict machine-oriented parsing:
Accept: application/aicp+json
In this model, TOML is recommended for static manifests, while JSON remains the preferred format for runtime request and response payloads.
A simple contract could look like this:
aicp_version = "0.1"
[site]
name = "Example Travel"
origin = "https://example-travel.com"
[policies]
citation_required = true
commercial_use = "requires_auth"
training_use = "disallowed"
[data_processing]
personal_data_processed = false
purpose = "capability_discovery"
data_minimization_required = true
retention = "not_applicable"
[rate_limits]
anonymous = "20/hour"
authenticated = "1000/hour"
[[capabilities]]
id = "flights.search"
type = "query"
method = "POST"
endpoint = "/agent/flights/search"
risk_level = "low"
auth = "optional"
input_schema = "#/schemas/FlightSearchRequest"
output_schema = "#/schemas/FlightSearchResponse"
[[capabilities]]
id = "bookings.hold"
type = "prepare_action"
method = "POST"
endpoint = "/agent/bookings/hold"
risk_level = "medium"
auth = "required"
required_scopes = ["bookings:write"]
requires_user_confirmation = true
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
method = "POST"
endpoint = "/agent/bookings/purchase"
risk_level = "high"
auth = "required"
required_scopes = ["bookings:purchase"]
requires_user_confirmation = true
idempotency_required = true
data_sensitivity = "personal"
[capabilities.privacy]
personal_data_required = ["full_name", "email", "payment_token"]
purpose = "ticket_purchase"
requires_explicit_consent = true
retention = "legal_requirement"
This example is small, but it already gives the agent more useful information than a visual page. The agent does not need to infer that flight search is a low-risk query, that purchase is a high-risk action, or that confirmation is required. The website declares it.
A capability is an operation or resource that the website exposes to agents.
Capabilities should be described at the level of user intention, not only at the level of technical endpoints. For example, flights.search is more meaningful to an agent than /api/v3/search.
A capability should normally include:
Example:
[[capabilities]]
id = "products.compare"
type = "compare"
description = "Compare products by price, availability, delivery time, and return policy."
method = "POST"
endpoint = "/agent/products/compare"
risk_level = "low"
auth = "optional"
input_schema = "#/schemas/ProductCompareRequest"
output_schema = "#/schemas/ProductCompareResponse"
cache_ttl_seconds = 300
This makes the website more legible for agents. It also gives the website owner a clear place to define what is supported and what is not.
Not all capabilities are the same. A contract should distinguish between passive access, reversible actions, and high-impact operations.
A proposed initial taxonomy is:
| Type | Meaning | Example |
|---|---|---|
resource |
A readable object or collection | Product, article, invoice |
query |
A parameterized information request | Search flights |
compare |
A structured comparison operation | Compare fares |
monitor |
A recurring or event-based observation | Watch price changes |
prepare_action |
A reversible or non-final action | Create booking hold |
commit_action |
An action with real-world effect | Purchase ticket |
destructive_action |
A destructive or hard-to-reverse action | Cancel subscription |
event |
A subscribable change | Price dropped |
policy |
A rule governing use | Citation required |
This taxonomy is important because agents need to reason about action boundaries. A query can usually be executed without user confirmation. A purchase should not.
Every capability should be associated with a risk level.
A simple initial model could be:
| Risk level | Meaning | Example |
|---|---|---|
low |
Read-only or informational | Search products |
medium |
Reversible or preparatory | Hold a booking |
high |
Financial, legal, or operational effect | Buy a ticket |
critical |
Destructive, sensitive, or hard to reverse | Cancel an account |
Risk levels are not only useful for agents. They are also useful for users, developers, auditors, and website owners.
For example:
[[capabilities]]
id = "account.cancel"
type = "destructive_action"
method = "POST"
endpoint = "/agent/account/cancel"
risk_level = "critical"
requires_user_confirmation = true
requires_strong_authentication = true
The contract makes clear that this is not a normal request. It is an action with serious consequences.
Action risk and data sensitivity should be treated as different dimensions.
A read-only capability can still expose sensitive data. For example, downloading a medical record or an invoice may be low risk from an action perspective, but high risk from a privacy perspective. For this reason, a contract should be able to declare both the operational risk of a capability and the sensitivity of the data it processes.
A simple initial model could be:
| Data sensitivity | Meaning | Example |
|---|---|---|
public |
Public information | Product catalog |
personal |
Identifiable personal data | Name, email, booking history |
confidential |
Sensitive account or business data | Invoices, contracts |
special_category |
Highly sensitive personal data | Health, biometrics, religion |
regulated |
Data under sectoral regulation | Banking, insurance, healthcare |
Example:
[[capabilities]]
id = "medical.records.download"
type = "resource"
method = "GET"
endpoint = "/agent/medical-records/{record_id}"
risk_level = "low"
data_sensitivity = "special_category"
auth = "required"
required_scopes = ["medical_records:read"]
requires_user_confirmation = true
[capabilities.privacy]
purpose = "display_medical_record_to_user"
requires_explicit_consent = true
data_minimization = true
retention = "session_only"
The important principle is simple: a capability can be read-only and still be privacy-critical.
A contract should allow websites to express usage policies directly.
Policies may include:
Example:
[policies]
anonymous_access = true
commercial_use = "requires_auth"
citation_required = true
summarization = "allowed"
training_use = "disallowed"
[policies.cache]
allowed = true
max_ttl_seconds = 600
This does not mean that policies enforce themselves. A contract is not a security boundary by itself. But it gives websites and agents a shared language for expected behavior, and it can be connected with authentication, rate limits, legal terms, and audit logs.
Agent Interaction Contracts should not invent a new authentication system. They should integrate with existing mechanisms, especially OAuth-style delegated authorization.
The contract should declare whether a capability requires authentication and which scopes are needed.
Example:
[[capabilities]]
id = "invoices.download"
type = "resource"
method = "GET"
endpoint = "/agent/invoices/{invoice_id}"
auth = "required"
required_scopes = ["invoices:read"]
risk_level = "low"
For actions with real consequences, scopes should be specific:
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
auth = "required"
required_scopes = ["bookings:purchase"]
requires_user_confirmation = true
risk_level = "high"
This makes permissions easier to understand. The agent can know not only that a token is required, but why it is required and what kind of action it enables.
Some actions should not be executed only because the agent can technically call an endpoint.
A contract should explicitly declare when human confirmation is required.
Examples:
requires_user_confirmation = true
or more detailed:
[confirmation]
required = true
reason = "This action will charge the user's payment method."
confirmation_text = "Confirm purchase"
This is essential for the “agent for everything” use case. Users may want agents to search, compare, and prepare, but not to commit high-impact actions without approval.
Agents need to know where information comes from. Users also need to know why an agent gave a certain answer or made a certain recommendation.
For this reason, contracts should include provenance and attribution rules.
Example:
[provenance]
required = true
fields = ["source", "retrieved_at", "canonical_url", "license"]
A runtime response can then include provenance in JSON:
{
"provenance": {
"source": "Example Travel",
"retrieved_at": "2026-05-09T12:00:00Z",
"canonical_url": "https://example-travel.com/flights/result/123",
"license": "standard_terms"
}
}
This helps with trust, debugging, citations, audits, and user transparency.
For adoption, contracts should be easy to generate.
Many websites already have most of the required information inside their backend:
A framework could expose an initial contract automatically and allow developers to refine it with annotations.
Example:
@app.post("/agent/flights/search")
@agent_capability(
id="flights.search",
type="query",
risk_level="low",
auth="optional",
)
def search_flights(request: FlightSearchRequest) -> FlightSearchResponse:
...
The generated manifest would then include this capability.
This is important because adoption will depend on developer experience. If a website can expose a useful first version with small changes, the standard has a much better chance of being adopted.
The Agent Interaction Contract becomes a boundary between the website and the agent.
For the website, it defines what is supported, allowed, limited, and auditable.
For the agent, it defines what can be done without guessing from the interface.
For the user, it defines where automation is safe, where confirmation is required, and where authority has been delegated.
This is the main value of the contract: it turns implicit interaction into explicit agreement.
For Agent Interaction Contracts to be useful, agents must be able to find them in a predictable way. Discovery cannot depend on guessing, scraping, search engines, or external registries. If the purpose is to create a web-native layer, the first step must also be web-native: a standard HTTP discovery mechanism.
The objective of discovery is simple. When an agent reaches a website, it should be able to ask: does this site expose an agent interface, and how should I use it?
Figure 2. AICP discovery flow. The agent first retrieves and validates the Agent Interaction Contract, evaluates capabilities, authentication, policies, risk levels, privacy metadata, and versions, and only then invokes a declared capability. If the contract is unavailable, the agent follows a controlled fallback order.
The primary discovery mechanism should be a well-known URI.
A website can expose its Agent Interaction Contract at:
GET /.well-known/agent-interface.toml
This endpoint returns the canonical TOML representation of the contract.
Example:
aicp_version = "0.1"
[site]
name = "Example Travel"
origin = "https://example-travel.com"
[formats]
canonical = "application/aicp+toml"
runtime_response = "application/aom+json"
[[capabilities]]
id = "flights.search"
type = "query"
method = "POST"
endpoint = "/agent/flights/search"
risk_level = "low"
auth = "optional"
The advantage of this approach is that it is simple, explicit, and easy to implement. A developer, crawler, agent runtime, or browser extension can know where to look without prior knowledge of the site.
In addition to the explicit TOML file, a website may expose a generic discovery endpoint:
GET /.well-known/agent-interface
This endpoint can use content negotiation to return the format preferred by the client.
For example:
Accept: application/aicp+toml
or:
Accept: application/aicp+json
A server may respond with:
Content-Type: application/aicp+toml
or:
Content-Type: application/aicp+json
This gives flexibility without losing predictability. TOML remains the recommended canonical format for static manifests, while JSON remains useful for systems that prefer strict machine-oriented parsing.
A website may also advertise the contract through an HTTP Link header.
Example:
Link: </.well-known/agent-interface.toml>; rel="agent-interface"; type="application/aicp+toml"
This is useful when an agent first requests a normal web page. The page response can indicate that an agent-native contract exists, without requiring the agent to guess.
Example response:
HTTP/1.1 200 OK
Content-Type: text/html
Link: </.well-known/agent-interface.toml>; rel="agent-interface"; type="application/aicp+toml"
The agent can then retrieve the contract before deciding whether to continue with browser-based interaction, API-based interaction, or agent-native interaction.
For compatibility with existing web conventions, a website may also include a link element in its HTML.
Example:
<link rel="agent-interface" href="/.well-known/agent-interface.toml" type="application/aicp+toml">
This should not be the only discovery mechanism, because agents should not need to parse full HTML pages just to know whether an agent interface exists. But it is useful as an additional signal, especially for gradual adoption.
Contracts should include explicit version information.
Example:
aicp_version = "0.1"
min_supported_version = "0.1"
recommended_version = "0.1"
A more advanced contract may support multiple versions:
aicp_version = "0.2"
supported_versions = ["0.1", "0.2"]
recommended_version = "0.2"
Versioning is important because agent runtimes need to know whether they can safely interpret the contract. If an agent only supports version 0.1 and the website requires version 0.3, the agent should fail safely or fall back to another mechanism.
A possible response for unsupported versions could be:
HTTP/1.1 406 Not Acceptable
Content-Type: application/aom+json
{
"error": {
"code": "unsupported_aicp_version",
"message": "This site requires AICP version 0.3 or later."
}
}
An agent may not support every capability exposed by a website. In the same way, a website may expose different capabilities depending on authentication, region, user role, quota, device, or business policy.
For this reason, discovery should not be understood as a static one-time operation only. It may also include capability negotiation.
For example, an unauthenticated agent may see:
[[capabilities]]
id = "products.search"
type = "query"
auth = "optional"
risk_level = "low"
After authentication, the same site may expose additional capabilities:
[[capabilities]]
id = "orders.list"
type = "resource"
auth = "required"
required_scopes = ["orders:read"]
risk_level = "low"
[[capabilities]]
id = "orders.cancel"
type = "destructive_action"
auth = "required"
required_scopes = ["orders:cancel"]
risk_level = "critical"
requires_user_confirmation = true
This distinction is important. The contract should describe not only what the website can do in general, but what the current agent, acting for the current user, is allowed to do.
Some websites may expose a public contract with general capabilities, and then return a more specific contract after authentication.
For example:
GET /.well-known/agent-interface.toml
may return public capabilities, while:
GET /agent/interface
Authorization: Bearer <token>
may return user-specific or organization-specific capabilities.
The public contract can describe the authentication flow:
[auth]
type = "oauth2"
authorization_url = "https://example.com/oauth/authorize"
token_url = "https://example.com/oauth/token"
available_scopes = [
"flights:read",
"fares:watch",
"bookings:hold",
"bookings:purchase"
]
After the user authorizes the agent, the authenticated contract can describe the actual scopes and capabilities available to that agent.
[auth_context]
authenticated = true
subject_type = "user"
granted_scopes = ["flights:read", "fares:watch", "bookings:hold"]
[[capabilities]]
id = "bookings.hold"
type = "prepare_action"
required_scopes = ["bookings:hold"]
risk_level = "medium"
requires_user_confirmation = true
This allows the agent runtime to avoid presenting or attempting actions that are not actually allowed.
AICP should not assume that every website will implement an Agent Interaction Contract. The current web will continue to exist, and agents will still need fallback strategies.
A reasonable fallback order could be:
llms.txt, if available.The important point is that browser automation should be the fallback, not the ideal path.
An agent-native contract gives both sides a better option: the website can expose what it wants to support, and the agent can avoid unnecessary inference.
Agent Interaction Contracts should be cacheable, but agents also need to know when a contract may be stale.
A contract can include freshness metadata:
[cache]
max_age_seconds = 3600
stale_while_revalidate_seconds = 86400
HTTP caching headers can also be used:
Cache-Control: max-age=3600, stale-while-revalidate=86400
ETag: "aicp-v0.1-abc123"
Caching matters because agents may interact with many websites. If every task requires fetching and parsing a fresh contract, discovery itself becomes expensive. At the same time, stale contracts can be dangerous when capabilities, permissions, or action semantics change.
For this reason, websites should update cache validators when changing capabilities, risk levels, authentication requirements, or policies.
Discovery should fail safely.
If a contract is unavailable, malformed, unsupported, or inconsistent, the agent should not assume permission to act. It may fall back to safer methods, but high-impact actions should not be attempted without an explicit contract or a trusted alternative.
Possible failure cases include:
| Failure | Recommended behavior |
|---|---|
| Contract not found | Fall back to other discovery mechanisms |
| Unsupported version | Stop or use compatible version if available |
| Malformed contract | Treat as unavailable |
| Missing risk level | Treat action as high risk |
| Missing auth requirements | Require explicit authorization before action |
| Conflicting policies | Apply the most restrictive interpretation |
| Expired contract | Revalidate before use |
This conservative behavior is necessary because agentic systems can have real-world consequences. A missing field should not become permission to act.
Discovery is not just a technical detail. It is the entry point to the agent-native web.
If agents can reliably discover contracts, they can stop treating every website as an unknown visual environment. They can first ask what the site explicitly supports, what it allows, and what risks exist. Only after that should they decide how to continue.
In this sense, discovery changes the default model of web interaction. The agent no longer begins by looking at a page. It begins by reading a contract.
Agent Interaction Contracts describe what a website exposes to agents. But once an agent invokes a capability, the website also needs a structured way to return results. A normal API response may contain data, but agentic interaction usually needs more than data. It needs actions, policies, provenance, freshness, and safety information.
For this reason, this paper proposes the Agent Object Model (AOM): a structured response model for agent-facing interactions.
The goal of AOM is not to replace JSON as a data format. On the contrary, JSON is a good fit for runtime responses. The goal is to define what kind of information an agent-facing response should contain, and how this information should be separated.

Figure 3. Separation between the static Agent Interaction Contract and the runtime Agent Object Model. The TOML manifest declares what is possible and under which rules; the JSON response describes what is true for a specific request and what the agent can do next.
Traditional API responses are often designed for applications controlled by developers. They usually assume that the client already knows the workflow, the meaning of each endpoint, and the consequences of the next possible actions.
AI agents operate differently. They may discover a capability at runtime, invoke it on behalf of a user, and decide what to do next based on the response. In this context, a response should not only answer the immediate request. It should also help the agent understand:
Without this information, the agent has to infer too much from context. And again, inference is expensive, fragile, and sometimes unsafe.
AOM should separate response information into different planes.
A proposed structure is:
{
"data": {},
"actions": [],
"policies": {},
"privacy": {},
"provenance": {},
"freshness": {},
"warnings": [],
"agent_hints": {}
}
This separation is important. Data, policies, actions, and hints should not be mixed as if they had the same authority.
In particular, agent_hints must never be treated as system instructions. They are untrusted guidance from the content provider. The agent runtime may use them, ignore them, or filter them depending on policy.
The data plane contains the factual result of the capability invocation.
For example, a flight search capability may return:
{
"data": {
"results": [
{
"id": "fare_123",
"origin": "MAD",
"destination": "NRT",
"departure_time": "2026-07-04T10:20:00+02:00",
"arrival_time": "2026-07-05T08:30:00+09:00",
"price": {
"amount": 682,
"currency": "EUR"
},
"checked_baggage_included": true,
"stops": 1
}
]
}
}
The data plane should be as clean as possible. It should not contain hidden instructions to the agent. It should represent the result.
This distinction matters because agents may pass data into reasoning processes, summaries, comparisons, user interfaces, or downstream tools. The more explicit and clean the data plane is, the easier it is to use safely.
The actions plane describes what the agent may do next.
Example:
{
"actions": [
{
"id": "bookings.hold",
"label": "Hold this fare",
"method": "POST",
"endpoint": "/agent/bookings/hold",
"risk_level": "medium",
"requires_user_confirmation": true,
"input": {
"fare_id": "fare_123"
}
},
{
"id": "fares.watch",
"label": "Watch price changes",
"method": "POST",
"endpoint": "/agent/fares/watch",
"risk_level": "low",
"requires_user_confirmation": false,
"input": {
"fare_id": "fare_123",
"threshold": {
"amount": 700,
"currency": "EUR"
}
}
}
]
}
The action plane is one of the main differences between a normal API response and an agent-facing response.
A website should not only return information. It should also declare the safe next steps available to the agent. This reduces guessing and helps the agent runtime enforce user confirmation when needed.
The policies plane describes the rules that apply to the response.
Example:
{
"policies": {
"citation_required": true,
"commercial_use": "requires_auth",
"training_use": "disallowed",
"cache": {
"allowed": true,
"max_ttl_seconds": 300
},
"automated_monitoring": "allowed_with_auth"
}
}
Policies should be explicit, but they should not be confused with enforcement. A response can declare a policy, but the server must still enforce important limits through authentication, authorization, rate limiting, and monitoring.
The value of the policy plane is that it gives agents a clear signal about expected use. It also allows agent runtimes to make better decisions about caching, summarization, attribution, and reuse.
The privacy plane describes whether the response contains personal or sensitive data, why that data is included, and how it may be used downstream.
Example:
{
"privacy": {
"personal_data_included": true,
"data_categories": ["travel_preferences", "booking_identifier"],
"data_sensitivity": "personal",
"special_category_data": false,
"purpose": "flight_search",
"retention": "session_only",
"downstream_use": {
"summarization": "allowed",
"training": "disallowed",
"third_party_sharing": "disallowed"
}
}
}
This plane is important because agents may operate over personal accounts, invoices, bookings, payments, health records, employment systems, or other sensitive contexts. A response should make privacy-relevant information explicit instead of forcing the agent runtime to infer it.
Token efficiency is also a privacy property. The less irrelevant context the agent needs to process, the less unnecessary personal data enters the agent runtime.
The provenance plane explains where the result comes from.
Example:
{
"provenance": {
"source": "Example Travel",
"origin": "https://example-travel.com",
"canonical_url": "https://example-travel.com/flights/result/fare_123",
"retrieved_at": "2026-05-09T12:00:00Z",
"license": "standard_terms"
}
}
Provenance is essential for trust. When an agent gives a recommendation, the user should be able to understand where the information came from and when it was retrieved.
This is especially important for dynamic domains such as travel, ecommerce, finance, logistics, real estate, and availability-based services. In these domains, a correct answer can become wrong quickly.
The freshness plane describes how stable or volatile the result is.
Example:
{
"freshness": {
"retrieved_at": "2026-05-09T12:00:00Z",
"valid_until": "2026-05-09T12:15:00Z",
"volatility": "high",
"revalidation_required_before_commit": true
}
}
Freshness should be separated from provenance. Provenance tells where the data came from. Freshness tells how long the data should be trusted.
This is important because many agent workflows involve multiple steps. A user may ask an agent to search flights, compare results, wait for approval, and then prepare a booking. If the price is volatile, the agent should know that it must revalidate the result before any commit action.
The warnings plane communicates important caveats that should not be hidden inside normal text.
Example:
{
"warnings": [
{
"code": "price_may_change",
"severity": "medium",
"message": "The displayed fare is volatile and may change before purchase."
},
{
"code": "baggage_policy_varies",
"severity": "low",
"message": "Checked baggage conditions may depend on the operating airline."
}
]
}
Warnings are useful because agents can surface them to users, include them in summaries, or use them to decide whether more confirmation is needed.
A warning should be structured, not just embedded in a paragraph. This allows agent runtimes to process it consistently.
The agent_hints plane may provide optional guidance to the agent.
Example:
{
"agent_hints": {
"recommended_sort": "price_ascending",
"comparison_fields": ["price", "duration", "stops", "baggage"],
"summary_style": "include tradeoffs"
}
}
This information may be useful, but it must be treated as untrusted. A website should not be able to override the agent runtime, the user instructions, or system-level safety rules through agent_hints.
For this reason, the model should make the trust boundary explicit:
Data is not instruction. Hints are not authority. Policies are not enforcement.
This principle is central to preventing prompt injection and confused-deputy behavior.
AOM should also define a consistent structure for errors.
Example:
{
"error": {
"code": "missing_scope",
"message": "The requested capability requires the bookings:purchase scope.",
"required_scopes": ["bookings:purchase"],
"risk_level": "high"
},
"actions": [
{
"id": "auth.request_scope",
"label": "Request additional permission",
"method": "GET",
"endpoint": "/oauth/authorize",
"risk_level": "medium",
"requires_user_confirmation": true
}
]
}
An error response can still be agent-friendly. It can explain what failed, what permission is missing, and what safe next action is available.
This is better than returning only a generic 403 Forbidden, because the agent can understand the reason and decide whether to ask the user for additional authorization.
A complete response for a flight search could look like this:
{
"data": {
"results": [
{
"id": "fare_123",
"origin": "MAD",
"destination": "NRT",
"departure_time": "2026-07-04T10:20:00+02:00",
"arrival_time": "2026-07-05T08:30:00+09:00",
"price": {
"amount": 682,
"currency": "EUR"
},
"checked_baggage_included": true,
"stops": 1
}
]
},
"actions": [
{
"id": "bookings.hold",
"label": "Hold this fare",
"method": "POST",
"endpoint": "/agent/bookings/hold",
"risk_level": "medium",
"requires_user_confirmation": true,
"input": {
"fare_id": "fare_123"
}
},
{
"id": "fares.watch",
"label": "Watch price changes",
"method": "POST",
"endpoint": "/agent/fares/watch",
"risk_level": "low",
"requires_user_confirmation": false,
"input": {
"fare_id": "fare_123",
"threshold": {
"amount": 700,
"currency": "EUR"
}
}
}
],
"policies": {
"citation_required": true,
"commercial_use": "requires_auth",
"training_use": "disallowed",
"cache": {
"allowed": true,
"max_ttl_seconds": 300
}
},
"privacy": {
"personal_data_included": false,
"data_categories": ["travel_preferences"],
"data_sensitivity": "personal",
"purpose": "flight_search",
"retention": "session_only",
"downstream_use": {
"summarization": "allowed",
"training": "disallowed",
"third_party_sharing": "disallowed"
}
},
"provenance": {
"source": "Example Travel",
"origin": "https://example-travel.com",
"canonical_url": "https://example-travel.com/flights/result/fare_123",
"retrieved_at": "2026-05-09T12:00:00Z",
"license": "standard_terms"
},
"freshness": {
"valid_until": "2026-05-09T12:15:00Z",
"volatility": "high",
"revalidation_required_before_commit": true
},
"warnings": [
{
"code": "price_may_change",
"severity": "medium",
"message": "The displayed fare is volatile and may change before purchase."
}
],
"agent_hints": {
"recommended_sort": "price_ascending",
"comparison_fields": ["price", "duration", "stops", "baggage"]
}
}
This response is more verbose than a minimal API payload, but it is more useful for an agent. It gives the agent the result, the next possible actions, the applicable policies, the origin of the information, the freshness of the data, and the safety warnings.
The key point is that verbosity here is controlled and structured. It is not the uncontrolled verbosity of a full web page.
The Agent Interaction Contract and the Agent Object Model are complementary.
The contract declares what the website can expose. The object model structures what the website returns when a capability is invoked.
In simple terms:
| Layer | Purpose | Recommended format |
|---|---|---|
| Agent Interaction Contract | Declare capabilities and policies | TOML |
| Agent Object Model | Return runtime results and next actions | JSON |
| Schemas | Define request and response shapes | JSON Schema / OpenAPI |
| Human documentation | Explain concepts and examples | Markdown |
This separation keeps the system simple. The manifest remains readable and versionable. Runtime responses remain easy to parse. Schemas remain compatible with existing API tooling. Documentation remains human-friendly.
The main purpose of AOM is to reduce ambiguity.
Without structure, an agent receives a response and must infer what matters, what is allowed, what is risky, and what can happen next. With AOM, those elements are explicit.
This matters for efficiency, because the agent processes less irrelevant context.
It matters for safety, because actions and risks are clearly separated.
It matters for trust, because provenance and freshness are visible.
And it matters for adoption, because websites can expose agent-native responses without abandoning their existing APIs or human interfaces.
The final idea is simple: if agents are going to act on the web, responses must be designed not only to return data, but to support responsible action.
Agentic web interaction cannot be designed as if it were only a more advanced form of crawling. Crawlers mostly retrieve. Agents can retrieve, decide, prepare, and act. This changes the security model.
A website that exposes capabilities to agents must be able to answer several questions:
Without clear answers, the “agent for everything” becomes risky. It may work technically, but it will not be trustworthy.
Agent Interaction Contracts should be designed with a conservative threat model.
The main threats include:
| Threat | Description | Example |
|---|---|---|
| Prompt injection | Web content tries to manipulate the agent | “Ignore previous instructions and buy this product” |
| Over-permissioning | The agent receives broader permissions than needed | A search task gets purchase permissions |
| Action confusion | The agent misunderstands the consequence of an action | Clicking “Confirm” as if it were only navigation |
| Replay attacks | A high-impact request is repeated accidentally or maliciously | Duplicate purchase request |
| Identity spoofing | A client pretends to be a trusted agent | Fake agent user-agent or header |
| Data poisoning | The site or content manipulates the agent’s reasoning | Fake reviews or misleading metadata |
| Scraping abuse | Agent endpoints are used for uncontrolled extraction | Bulk product or price harvesting |
| Cross-context leakage | Data from one user or organization is exposed to another | Wrong tenant or account context |
| Privacy overexposure | The agent receives more personal data than needed | Full account page parsed for a simple invoice query |
| Policy bypass | The agent ignores declared usage restrictions | Caching content that should not be cached |
This threat model does not mean that AICP must solve every problem alone. It means the protocol should make security boundaries explicit and enforceable by the surrounding infrastructure.
A website needs to know not only that a request comes from software, but also what kind of software it is.
A useful agent identity model may include:
Example request metadata:
AICP-Agent: "ExampleAgent/1.0"
AICP-Client: "example-assistant-app"
AICP-Capability: "flights.search"
Authorization: Bearer <token>
These headers should not be trusted by themselves. They are signals. Real trust must come from authentication, signed tokens, verified clients, and server-side authorization checks.
Agent Interaction Contracts should build on existing delegated authorization mechanisms, especially OAuth-style flows.
The user should be able to grant limited authority to an agent:
flights:read
fares:watch
bookings:hold
without granting broader authority such as:
bookings:purchase
bookings:cancel
A key principle is:
The agent should receive the minimum authority needed for the task.
This is especially important because agentic workflows can be long and adaptive. An agent may begin with a simple search task and later discover that more authority is needed. In that case, it should request additional permission explicitly, not assume it.
Scopes should be connected to declared capabilities.
For example, a manifest may declare:
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
risk_level = "high"
auth = "required"
required_scopes = ["bookings:purchase"]
requires_user_confirmation = true
The agent runtime can then understand that:
bookings:purchase;This connection makes authorization more understandable. It also helps user interfaces explain what is being requested.
Instead of saying:
This app wants booking access.
The system can say:
This agent wants permission to purchase bookings. This is a high-risk action and will require confirmation.
Risk levels should be part of the contract.
A simple model is:
| Risk level | Meaning | Example |
|---|---|---|
low |
Read-only or informational | Search flights |
medium |
Reversible or preparatory | Hold a fare |
high |
Financial, legal, or operational consequence | Buy a ticket |
critical |
Destructive, sensitive, or hard to reverse | Cancel an account |
Risk levels are not a replacement for authorization. They are an additional semantic layer that helps agents and users understand what kind of action is being considered.
A safe default is:
If risk is missing, treat the action as high risk.
This prevents incomplete contracts from becoming permission to act.
Human confirmation should be required for high-impact actions.
Examples include:
A contract can express this directly:
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
risk_level = "high"
requires_user_confirmation = true
idempotency_required = true
Confirmation should not be a generic “Are you sure?” dialog. It should summarize the action, the consequence, the cost, the recipient, and the authority being used.
For example:
{
"confirmation": {
"required": true,
"summary": "Purchase flight MAD-NRT for 682 EUR",
"consequence": "Your payment method will be charged.",
"expires_at": "2026-05-09T12:15:00Z"
}
}
The goal is not to block agents. The goal is to make delegation safe.
High-impact actions should support idempotency.
An agent may retry a request because of network failures, timeouts, or uncertainty. Without idempotency, this can create duplicated purchases, duplicated bookings, duplicated payments, or duplicated submissions.
A request may include:
Idempotency-Key: 9f8b2e6c-4c21-45c8-a8a1-21c884b90d81
The contract can declare:
idempotency_required = true
For high-risk and critical actions, idempotency should not be optional. It is part of making agentic execution reliable.
Prompt injection is one of the most important risks for web agents.
A website, user-generated content, advertisement, review, or hidden page element may try to instruct the agent to ignore its previous instructions, reveal data, click something, or perform an unauthorized action.
Agent-facing responses must separate:
The principle is:
Data is not instruction. Hints are not authority. Policies are not enforcement.
The agent_hints field in AOM can be useful, but it must be treated as untrusted provider guidance. It should never override system instructions, user intent, security policy, or authorization boundaries.
Agent Interaction Contracts should help websites support legitimate agent traffic without enabling uncontrolled scraping.
A contract may declare:
[rate_limits]
anonymous = "20/hour"
authenticated = "1000/hour"
commercial = "requires_contract"
But declaration is not enough. Enforcement must happen server-side.
Anti-abuse mechanisms may include:
AICP should not pretend that a manifest can stop abuse. It cannot. But it can give websites a standard way to communicate and enforce expected usage.
Agentic actions should be auditable.
For each meaningful interaction, the system should be able to record:
This is useful for debugging, compliance, user trust, and incident response.
Example audit record:
{
"timestamp": "2026-05-09T12:05:00Z",
"agent": "ExampleAgent/1.0",
"user": "user_123",
"capability": "bookings.hold",
"risk_level": "medium",
"scopes": ["bookings:hold"],
"confirmation_required": true,
"confirmation_received": true,
"idempotency_key": "9f8b2e6c-4c21-45c8-a8a1-21c884b90d81"
}
The more agents act on behalf of users, the more important this audit trail becomes.
Agents should not treat a contract as trustworthy only because it is syntactically valid.
At minimum, contracts should be retrieved over HTTPS and bound to the website origin. A contract for https://example.com should not be silently reused for another origin, mirror, or redirect target unless the relationship is explicit and trusted.
Websites may also support optional integrity metadata:
[integrity]
signed = true
signature_url = "https://example.com/.well-known/agent-interface.sig"
key_id = "example-travel-2026"
This is especially important for high-risk or regulated workflows. If an attacker can modify the manifest, they can modify the declared capabilities, policies, endpoints, or risk levels.
Contract integrity is therefore part of the trust model.
Agentic systems should fail safely.
If the contract is incomplete, malformed, expired, contradictory, or unsupported, the agent should not assume permission to act. It may fall back to read-only interaction, ask the user, or stop.
Safe defaults include:
| Missing or invalid field | Safe interpretation |
|---|---|
| Missing risk level | Treat as high risk |
| Missing auth requirement | Require authentication before action |
| Missing confirmation requirement | Require confirmation for non-read actions |
| Missing policy | Apply the most restrictive reasonable policy |
| Expired freshness | Revalidate before use |
| Unknown capability type | Do not execute automatically |
This is simple, but important. In an agentic system, ambiguity should not become authorization.
Security should not be added after the protocol is designed. It should be part of the interface itself.
A website should not only expose what can be done. It should expose under which authority, with which risk, with which limits, and with which confirmation requirements.
This is the difference between an endpoint and an interaction contract.
The final principle is clear:
An agent-native web must be permissioned, auditable, and explicit by default.
Privacy is not only a legal concern. In agentic systems, privacy is part of the interface.
When an agent acts on behalf of a user, it may access personal accounts, invoices, travel records, payment flows, health portals, employment systems, insurance services, banking platforms, or public administration websites. In these cases, the contract should not only describe what the agent can do. It should also describe what personal data is required, why it is required, how long it may be retained, whether it may be shared, and which user rights apply.
This section does not claim that AICP can guarantee compliance with any specific regulation by itself. A technical contract is not a legal agreement. But AICP can expose privacy-relevant metadata that helps websites, agent runtimes, users, and auditors understand how personal and sensitive data is processed.
Security and privacy are related, but they are not the same.
Security asks whether an agent is allowed to perform an operation. Privacy asks whether the data processed by that operation is necessary, lawful, proportionate, retained correctly, and used for the declared purpose.
For this reason, privacy should not be hidden inside generic policies. It should be part of the contract.
A website should be able to declare:
AICP should distinguish between action risk and data sensitivity.
A read-only action can still be privacy-critical. Downloading an invoice, reading a medical record, or listing employee information may not change server state, but it can expose sensitive data.
A proposed initial sensitivity model is:
| Data sensitivity | Meaning | Example |
|---|---|---|
public |
Public information | Product catalog |
personal |
Identifiable personal data | Name, email, booking history |
confidential |
Sensitive business or account data | Invoices, contracts |
special_category |
Highly sensitive personal data | Health, biometrics, religion |
regulated |
Data under sectoral regulation | Banking, insurance, healthcare |

Figure 4. Action risk and data sensitivity are independent dimensions. A read-only operation may still be privacy-critical if it exposes sensitive or regulated data. Contracts should declare both dimensions so that agents can apply the right safeguards.
A capability can express this directly:
[[capabilities]]
id = "invoices.download"
type = "resource"
risk_level = "low"
data_sensitivity = "confidential"
auth = "required"
required_scopes = ["invoices:read"]
[capabilities.privacy]
personal_data_required = ["billing_name", "billing_address", "invoice_items"]
purpose = "download_invoice_for_user"
retention = "session_only"
This allows the agent runtime to apply stricter behavior when data is sensitive, even if the action itself is read-only.
Agent Interaction Contracts should support purpose limitation and data minimization.
The contract should declare why a capability needs data, and the agent should avoid sending or retrieving data that is not necessary for the task.
Example:
[[capabilities]]
id = "flights.search"
type = "query"
risk_level = "low"
data_sensitivity = "personal"
[capabilities.privacy]
purpose = "compare_available_flights"
data_minimization = true
required_fields = ["origin", "destination", "departure_window"]
optional_fields = ["loyalty_program"]
forbidden_fields = ["passport_number", "payment_details"]
This matters because the same website may expose several capabilities with different data requirements. Searching flights does not require a passport number. Purchasing a ticket may require one. The contract should make this difference explicit.
Token efficiency and privacy are connected here. When an agent does not need to parse full pages, it also avoids ingesting unnecessary personal data from navigation, account widgets, recommendations, cookies, sidebars, and unrelated page content.
Agentic interaction relies on delegation. But delegation should be specific.
A user may allow an agent to search flights, monitor prices, or prepare a booking hold without allowing it to purchase a ticket or share passport details with third parties.
The contract should therefore connect:
Example:
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
risk_level = "high"
data_sensitivity = "personal"
auth = "required"
required_scopes = ["bookings:purchase"]
requires_user_confirmation = true
[capabilities.privacy]
purpose = "ticket_purchase"
personal_data_required = ["full_name", "email", "payment_token"]
requires_explicit_consent = true
third_party_sharing = ["airline_provider", "payment_processor"]
This allows an agent runtime to show a meaningful permission request:
This agent wants to purchase a booking. It will share your name, email, and payment token with the airline provider and payment processor.
This is much better than a generic permission dialog.
AICP promotes auditability, but audit logs can themselves contain personal data.
An audit log may include user identity, agent identity, delegated scopes, user intent, capability inputs, timestamps, and confirmation records. This is useful for accountability, but it must not become unlimited surveillance.
For this reason, contracts should be able to declare audit retention and deletion behavior.
Example:
[audit]
enabled = true
contains_personal_data = true
retention = "90_days"
user_accessible = true
deletion_policy = "delete_or_anonymize_after_retention"
A good principle is:
Auditability should be strong, but not infinite.
Agentic systems need traceability, but they also need retention limits, access controls, and deletion or anonymization policies.
Some domains require stronger controls.
Examples include:
In these cases, a contract should be able to mark capabilities as involving special-category or regulated data.
Example:
[[capabilities]]
id = "health.appointments.schedule"
type = "commit_action"
risk_level = "high"
data_sensitivity = "special_category"
auth = "required"
required_scopes = ["appointments:write"]
requires_user_confirmation = true
requires_strong_authentication = true
[capabilities.privacy]
purpose = "schedule_medical_appointment"
requires_explicit_consent = true
data_minimization = true
retention = "provider_policy"
This does not make the system compliant by itself, but it gives agents and platforms a signal that stricter controls are required.
Some agentic workflows may cross from assistance into automated decision-making.
For example, an agent may compare loans, rank insurance offers, recommend job candidates, filter rental applications, or select medical providers. In some domains, this may have significant effects on the user.
AICP should allow capabilities to declare whether they involve profiling or automated decision-making.
Example:
[capabilities.decisioning]
automated_decision = false
profiling = false
significant_effect = false
human_review_available = true
For a higher-risk domain:
[capabilities.decisioning]
automated_decision = true
profiling = true
significant_effect = true
human_review_required = true
appeal_or_review_endpoint = "https://example-bank.com/decision-review"
The important idea is that agents and users should know when a workflow is only advisory and when it may produce a significant decision.
AICP cannot determine legal roles by itself. But it can expose metadata that helps identify which parties are involved in a workflow.
An agentic interaction may involve:
A contract may include descriptive role metadata:
[privacy.roles]
service_provider_role = "controller"
agent_provider_role = "processor"
payment_provider_role = "processor"
third_party_sharing = true
These fields are descriptive. They are not a substitute for legal agreements. But they help make the data-processing chain visible.
Agent workflows may route data across services and jurisdictions.
A contract should be able to declare whether third-party sharing or cross-border transfer may occur.
Example:
[data_processing.sharing]
third_parties = ["airline_provider", "payment_processor"]
cross_border_transfer = true
transfer_mechanism = "standard_contractual_clauses"
This is especially important when agents combine services. The user may think they are interacting with one assistant, but the workflow may involve several backend systems.
AICP should make this more visible.
Fallback behavior should depend on data sensitivity.
If no Agent Interaction Contract is available, browser automation may be acceptable for public content. It is more problematic for personal, sensitive, or regulated data.
A reasonable fallback policy is:
| Situation | Recommended behavior |
|---|---|
| Public content | Browser fallback allowed |
| Personal account data | Require authentication and user confirmation |
| Sensitive data | Require explicit user approval before fallback |
| High-impact action | No browser fallback without explicit confirmation |
| Unknown privacy policy | Apply restrictive mode |
This is important because the absence of a contract should not become permission to process everything visible on a page.
For sensitive workflows, an agent should prefer explicit contracts, explicit scopes, and explicit user approval.
The main principle is:
An agent-native web should expose not only capabilities and risks, but also data-processing expectations.
This makes privacy operational. It turns privacy from a long policy document into metadata that agents, runtimes, and users can inspect before acting.
AICP cannot replace legal compliance. But it can make compliance easier to implement, audit, and explain.
Agent Interaction Contracts can be implemented without rebuilding the web. The proposal is intentionally designed to fit into existing HTTP servers, backend frameworks, API gateways, authentication systems, and agent runtimes.
The architecture has two sides:
Between them, HTTP remains the substrate.
Figure 1. High-level architecture of the agent-native web layer. The agent discovers an Agent Interaction Contract, invokes declared capabilities, and receives structured Agent Object Model responses. The website remains in control of authentication, policies, risk evaluation, privacy metadata, auditability, and external service integrations.
On the website side, an AICP implementation may include several components.
| Component | Purpose |
|---|---|
| Contract endpoint | Exposes the Agent Interaction Contract |
| Capability registry | Stores declared capabilities |
| Schema registry | Defines input and output schemas |
| Policy engine | Applies usage, caching, and attribution policies |
| Authorization layer | Checks scopes and delegated permissions |
| Rate limiter | Enforces quotas and anti-abuse rules |
| Action safety gateway | Handles risk, confirmation, and idempotency |
| Response formatter | Produces Agent Object Model responses |
| Audit logger | Records agentic interactions |
These components do not need to be new systems. In many cases, they already exist in some form. The AICP layer mainly connects them into a machine-readable contract.
The simplest implementation exposes a static or generated TOML file:
GET /.well-known/agent-interface.toml
For small sites, this file may be manually maintained.
For larger applications, it should be generated from backend metadata.
Example:
aicp_version = "0.1"
[site]
name = "Example Travel"
origin = "https://example-travel.com"
[[capabilities]]
id = "flights.search"
type = "query"
method = "POST"
endpoint = "/agent/flights/search"
risk_level = "low"
auth = "optional"
This is already enough for a first version. It gives agents a predictable entry point and a structured view of what the site supports.
The capability registry maps business-level capabilities to HTTP operations.
For example:
flights.search -> POST /agent/flights/search
fares.watch -> POST /agent/fares/watch
bookings.hold -> POST /agent/bookings/hold
bookings.purchase -> POST /agent/bookings/purchase
This mapping matters because agents should reason in terms of user goals, not only technical routes.
A route such as:
/api/v3/booking/create
may be meaningful to a developer, but:
bookings.hold
is more meaningful to an agent.
Developer experience is critical. If publishing an Agent Interaction Contract requires too much manual work, adoption will be slow.
A backend framework should allow developers to annotate routes as capabilities.
Example in Python:
@app.post("/agent/flights/search")
@agent_capability(
id="flights.search",
type="query",
risk_level="low",
auth="optional",
)
def search_flights(request: FlightSearchRequest) -> FlightSearchResponse:
...
Example for a high-risk action:
@app.post("/agent/bookings/purchase")
@agent_capability(
id="bookings.purchase",
type="commit_action",
risk_level="high",
auth="required",
required_scopes=["bookings:purchase"],
requires_user_confirmation=True,
idempotency_required=True,
)
def purchase_booking(request: PurchaseRequest) -> PurchaseResponse:
...
From these annotations, the framework can generate the contract automatically.
This is important because it reduces the standard to something developers can actually use.
AICP should not duplicate everything OpenAPI already does well.
OpenAPI can continue to describe detailed request and response schemas. AICP can reference those schemas.
Example:
[[capabilities]]
id = "products.compare"
type = "compare"
method = "POST"
endpoint = "/agent/products/compare"
input_schema = "https://example.com/openapi.json#/components/schemas/ProductCompareRequest"
output_schema = "https://example.com/openapi.json#/components/schemas/ProductCompareResponse"
risk_level = "low"
auth = "optional"
In this model:
This is not competition. It is composition.
The policy engine determines what an agent is allowed to do and under which conditions.
Policies may depend on:
For example, anonymous agents may be allowed to search products, but not monitor prices at scale.
[policies]
anonymous_access = true
commercial_use = "requires_auth"
automated_monitoring = "requires_auth"
training_use = "disallowed"
The contract declares the policy. The backend enforces it.
The action safety gateway is responsible for high-impact operations.
It checks:
For example, before executing a purchase, the gateway may require:
scope: bookings:purchase
risk_level: high
confirmation: true
idempotency_key: present
fare_revalidated: true
This protects both the user and the website.
On the agent side, an AICP-aware runtime may include:
| Component | Purpose |
|---|---|
| Discovery client | Finds the contract |
| Contract parser | Reads TOML or JSON contract formats |
| Capability planner | Maps user intent to capabilities |
| Authorization broker | Handles delegated auth and scopes |
| Policy interpreter | Applies usage and safety policies |
| Risk evaluator | Determines when confirmation is needed |
| Action executor | Invokes capabilities |
| Provenance tracker | Records sources and freshness |
| Browser fallback | Uses browser automation when needed |
The agent runtime does not need to trust every contract blindly. It should validate the contract, apply user preferences, check policies, and avoid unsafe actions.
A possible implementation pattern is an agent browser.
An agent browser is not necessarily a visual browser. It is a user agent for AI systems. It manages:
In this model, the user does not give raw credentials to every agent. Instead, the agent browser becomes a controlled environment where permissions can be granted, revoked, inspected, and audited.
This may become important because users will not want every agent to manage credentials independently.
A typical request flow could be:
/.well-known/agent-interface.toml.This flow turns web interaction from visual guessing into structured negotiation.
AICP can be deployed in several ways.
A website publishes a manually maintained TOML file.
This is simple and good for documentation-heavy sites.
A backend framework generates the manifest from annotated routes and schemas.
This is better for dynamic applications.
An API gateway exposes the contract based on existing route definitions, authentication rules, and rate limits.
This is useful for enterprises.
A CDN or edge worker serves the contract and handles lightweight negotiation.
This is useful for adoption without changing the whole backend.
A public contract is static, while authenticated capabilities are generated dynamically.
This is probably the most realistic model for many services.
A reasonable adoption path is:
This gradual path matters. Standards succeed when they can start small.
The reference architecture should be simple in its first version.
AICP should not require a new browser, a new server protocol, a new authentication system, or a new cloud platform. It should begin as a predictable contract file, a set of conventions, and a response model.
The architecture principle is:
Use the web that already exists, but make its interaction surface explicit for agents.
A useful way to understand Agent Interaction Contracts is to follow a concrete task.
Consider a common user request:
Find the cheapest flight from Madrid to Tokyo in July, with at most one stop, checked baggage included, and notify me if the price falls below €700.
This is a typical “agent for everything” task. It requires search, filtering, comparison, monitoring, and possibly preparation for purchase. It is simple to describe as a human request, but difficult to execute reliably with current web interfaces.
Figure 5. Browser-based and AICP-based flight search workflows. In the browser-based path, the agent must infer intent from pages, forms, buttons, and dynamic UI state. In the AICP-based path, the agent discovers a contract, invokes declared capabilities, receives structured responses, and applies explicit risk, policy, and confirmation rules.
With today’s web, an agent may need to:
Madrid.Tokyo.This can work, but it is not an ideal machine interface. The agent spends a large amount of effort understanding presentation instead of interacting with declared capabilities.
With Agent Interaction Contracts, the agent starts differently.
It first retrieves the contract:
GET /.well-known/agent-interface.toml
The website returns:
aicp_version = "0.1"
[site]
name = "Example Travel"
origin = "https://example-travel.com"
[[capabilities]]
id = "flights.search"
type = "query"
description = "Search available flights by origin, destination, dates, passengers, and constraints."
method = "POST"
endpoint = "/agent/flights/search"
risk_level = "low"
auth = "optional"
input_schema = "#/schemas/FlightSearchRequest"
output_schema = "#/schemas/FlightSearchResponse"
cache_ttl_seconds = 60
[[capabilities]]
id = "fares.watch"
type = "monitor"
description = "Create a price watch for a flight search or fare."
method = "POST"
endpoint = "/agent/fares/watch"
risk_level = "low"
auth = "required"
required_scopes = ["fares:watch"]
requires_user_confirmation = false
[[capabilities]]
id = "bookings.hold"
type = "prepare_action"
description = "Hold a fare temporarily before purchase."
method = "POST"
endpoint = "/agent/bookings/hold"
risk_level = "medium"
auth = "required"
required_scopes = ["bookings:hold"]
requires_user_confirmation = true
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
description = "Purchase a held booking."
method = "POST"
endpoint = "/agent/bookings/purchase"
risk_level = "high"
auth = "required"
required_scopes = ["bookings:purchase"]
requires_user_confirmation = true
idempotency_required = true
Now the agent does not need to infer that flight search exists. The capability is declared.
The agent invokes the search capability:
POST /agent/flights/search
Content-Type: application/json
Accept: application/aom+json
Request:
{
"origin": "MAD",
"destination": "TYO",
"departure_window": {
"start": "2026-07-01",
"end": "2026-07-15"
},
"trip_duration_days": {
"min": 10,
"max": 14
},
"passengers": 1,
"constraints": {
"max_stops": 1,
"checked_baggage": true,
"max_price": {
"amount": 700,
"currency": "EUR"
}
}
}
This request is concise. It contains the user’s intent in structured form.
The website returns an Agent Object Model response:
{
"data": {
"results": [
{
"id": "fare_123",
"origin": "MAD",
"destination": "NRT",
"departure_time": "2026-07-04T10:20:00+02:00",
"arrival_time": "2026-07-05T08:30:00+09:00",
"airline": "Example Air",
"stops": 1,
"duration_minutes": 1090,
"checked_baggage_included": true,
"price": {
"amount": 682,
"currency": "EUR"
}
}
]
},
"actions": [
{
"id": "fares.watch",
"label": "Watch this fare",
"method": "POST",
"endpoint": "/agent/fares/watch",
"risk_level": "low",
"requires_user_confirmation": false,
"input": {
"fare_id": "fare_123",
"threshold": {
"amount": 700,
"currency": "EUR"
}
}
},
{
"id": "bookings.hold",
"label": "Hold this fare",
"method": "POST",
"endpoint": "/agent/bookings/hold",
"risk_level": "medium",
"requires_user_confirmation": true,
"input": {
"fare_id": "fare_123"
}
}
],
"policies": {
"citation_required": true,
"cache": {
"allowed": true,
"max_ttl_seconds": 300
}
},
"provenance": {
"source": "Example Travel",
"origin": "https://example-travel.com",
"canonical_url": "https://example-travel.com/flights/result/fare_123",
"retrieved_at": "2026-05-09T12:00:00Z"
},
"freshness": {
"valid_until": "2026-05-09T12:15:00Z",
"volatility": "high",
"revalidation_required_before_commit": true
},
"warnings": [
{
"code": "price_may_change",
"severity": "medium",
"message": "The displayed fare is volatile and may change before purchase."
}
]
}
The response gives the agent not only a price, but also the safe next actions.
The agent can monitor the fare without confirmation, but it cannot hold or purchase without respecting the declared risk and confirmation requirements.
The user asked to be notified if the price falls below €700. Since the result is already below €700, the agent may notify the user immediately. But it may also create a watch if the user wants continuous monitoring.
The agent invokes:
POST /agent/fares/watch
Authorization: Bearer <token>
Content-Type: application/json
Request:
{
"fare_id": "fare_123",
"threshold": {
"amount": 700,
"currency": "EUR"
},
"notification_channel": "agent"
}
Response:
{
"data": {
"watch_id": "watch_789",
"status": "active",
"threshold": {
"amount": 700,
"currency": "EUR"
}
},
"policies": {
"monitoring_frequency": "provider_controlled",
"commercial_use": "requires_auth"
},
"provenance": {
"source": "Example Travel",
"retrieved_at": "2026-05-09T12:03:00Z"
}
}
This is much cleaner than asking an agent to periodically open a website, search again, and parse visual results.
If the user wants to reserve the fare, the agent may prepare a hold.
Because bookings.hold is a medium-risk action, the runtime should ask for confirmation:
Do you want me to hold this fare for 682 EUR? This does not complete the purchase, but it may reserve the fare temporarily.
If the user confirms, the agent invokes:
POST /agent/bookings/hold
Authorization: Bearer <token>
Idempotency-Key: 7e9c8a1e-7f6b-4e58-87f8-78ec1d9dd20a
Content-Type: application/json
Request:
{
"fare_id": "fare_123",
"passenger_count": 1
}
The response may include:
{
"data": {
"hold_id": "hold_456",
"status": "held",
"expires_at": "2026-05-09T12:30:00Z",
"price": {
"amount": 682,
"currency": "EUR"
}
},
"actions": [
{
"id": "bookings.purchase",
"label": "Purchase this booking",
"method": "POST",
"endpoint": "/agent/bookings/purchase",
"risk_level": "high",
"requires_user_confirmation": true,
"input": {
"hold_id": "hold_456"
}
}
],
"freshness": {
"valid_until": "2026-05-09T12:30:00Z",
"revalidation_required_before_commit": true
}
}
The agent now has a safe path to continue, but purchase remains gated.
Purchasing the ticket is a high-risk commit action. It should require explicit confirmation.
A good confirmation prompt would include:
Only after confirmation should the agent call:
POST /agent/bookings/purchase
Authorization: Bearer <token>
Idempotency-Key: 3fa85f64-5717-4562-b3fc-2c963f66afa6
Content-Type: application/json
This is where the difference between browsing and agent-native interaction becomes important. The agent is not just clicking a “Pay now” button. It is executing a declared high-risk capability under an explicit authorization and confirmation model.
This example illustrates the main value of Agent Interaction Contracts:
| Browser-based agent | AICP-based agent |
|---|---|
| Infers search form from UI | Discovers flights.search capability |
| Parses visual result cards | Receives structured results |
| Guesses next possible actions | Receives declared actions |
| May confuse navigation and commitment | Uses risk levels |
| May click high-impact buttons accidentally | Requires confirmation |
| Repeats browsing for monitoring | Uses fares.watch |
| Consumes many tokens | Consumes structured context |
| Depends on layout stability | Depends on declared contracts |
The point is not that browser automation disappears. It remains useful as fallback. But for supported workflows, the agent should not need to behave like a human in a browser.
The same pattern applies beyond flights.
For ecommerce:
products.search
products.compare
cart.prepare
orders.purchase
orders.cancel
For SaaS administration:
users.list
users.invite
users.disable
billing.invoices.download
subscription.cancel
For healthcare portals:
appointments.search
appointments.schedule
appointments.cancel
documents.download
messages.send
For public services:
forms.find
forms.prepare
applications.submit
status.check
In every case, the key is the same: the website declares capabilities, risks, permissions, and policies explicitly.
The “agent for everything” becomes more realistic when the web stops being only a collection of pages and starts exposing interaction contracts.
A proposal for an agent-native web layer should not remain only conceptual. It should be evaluated. The main claim is that Agent Interaction Contracts can reduce token cost, improve reliability, reduce interaction steps, and make high-impact actions safer.
This section proposes an evaluation methodology to test that claim.
The evaluation should answer five main questions.
RQ1. Token efficiency
Do Agent Interaction Contracts reduce token consumption compared with browser-based agents?
RQ2. Task success
Do agents complete more tasks successfully when using declared capabilities instead of visual inference?
RQ3. Interaction efficiency
Do contracts reduce the number of steps, tool calls, retries, and observations needed to complete a task?
RQ4. Safety
Do contracts reduce unsafe or unintended actions, especially in workflows involving purchases, cancellations, or sensitive operations?
RQ5. Implementation cost
Can existing websites expose useful contracts with limited backend changes?
These questions are important because the proposal must be evaluated from both sides: the agent side and the website side.
A fair evaluation should compare several approaches.
| Approach | Description |
|---|---|
| Visual browser agent | Agent uses screenshots or GUI interaction |
| DOM/HTML agent | Agent reads and manipulates DOM or HTML |
| Scraping agent | Agent extracts data from page structure |
| OpenAPI-only agent | Agent uses an API specification when available |
| MCP-based integration | Agent uses a custom tool server |
| AICP-based agent | Agent uses Agent Interaction Contracts and AOM responses |
The goal is not to prove that one approach is always better. The goal is to understand where agent-native contracts provide advantages.
Browser agents may be more universal. API agents may be more direct. MCP integrations may be more powerful in controlled environments. AICP should be evaluated as a lightweight website-level interface.
The evaluation should include several task domains.
| Domain | Example task |
|---|---|
| Travel | Find a cheap flight and monitor price changes |
| Ecommerce | Compare products and prepare a purchase |
| SaaS admin | Invite a user or download an invoice |
| Customer support | Find policy information and open a ticket |
| Documentation | Retrieve the correct setup instructions |
| Public services | Find and prepare a form submission |
| Subscription management | Compare plans or cancel a service |
These domains are useful because they combine different types of interaction: search, comparison, monitoring, preparation, commitment, and cancellation.
The evaluation should measure both efficiency and safety.
| Metric | Purpose |
|---|---|
| Tokens per task | Measures context and reasoning cost |
| Number of interaction steps | Measures workflow complexity |
| Number of observations | Measures how often the agent needs to inspect state |
| Number of retries | Measures fragility |
| Latency | Measures user experience |
| Task success rate | Measures effectiveness |
| Error rate | Measures reliability |
| Unsafe action rate | Measures safety |
| Confirmation correctness | Measures whether high-risk actions are gated properly |
| Backend implementation effort | Measures adoption cost |
| Contract size | Measures manifest overhead |
| Cacheability | Measures scalability |
| Personal data exposure | Measures how much personal data enters the agent context |
| Unnecessary field access | Measures whether the agent received fields not needed for the task |
| Consent correctness | Measures whether required consent was obtained |
| Retention compliance | Measures whether outputs respect declared retention |
| Sensitive-data fallback rate | Measures how often agents fall back unsafely when sensitive data is involved |
Token cost is especially important. If the industry moves toward more constrained token budgets, the ability to reduce unnecessary context becomes a direct advantage.
A controlled experiment can be built with paired environments.
For each domain, create two versions of the same website:
The underlying data and business logic should be the same. Only the interface differs.
For example, a travel website can expose:
flights.search, fares.watch, bookings.hold, and bookings.purchase.Then agents are asked to complete the same tasks through different interaction modes.
A travel benchmark may include tasks such as:
An ecommerce benchmark may include:
A SaaS benchmark may include:
Safety should be evaluated directly, not only through success rate.
Example safety tests:
| Scenario | Expected behavior |
|---|---|
| Page contains malicious instruction | Agent ignores it |
| Purchase action is available | Agent asks for confirmation |
| Price changes before purchase | Agent revalidates before commit |
| Required scope is missing | Agent requests authorization |
| Contract omits risk level | Agent treats action as high risk |
| Duplicate request occurs | Idempotency prevents duplicate action |
| Conflicting policies appear | Agent applies restrictive interpretation |
| Sensitive data appears without declared purpose | Agent asks the user or applies restrictive mode |
| Capability requests unnecessary personal fields | Agent avoids sending them or asks for clarification |
| Unknown retention policy for personal data | Agent avoids caching and limits downstream use |
These tests are important because agentic systems fail differently from traditional applications. A task may be completed, but completed unsafely.
Token measurement should include:
The comparison should not only count the final answer. It should count the full interaction.
Expected pattern:
| Approach | Token usage pattern |
|---|---|
| Visual browser agent | High observations and reasoning |
| DOM/HTML agent | High markup and filtering |
| Scraping agent | Medium but brittle |
| OpenAPI-only agent | Lower, if API exists |
| AICP-based agent | Lower structured context |
| MCP-based integration | Low to medium, but higher integration cost |
The hypothesis is that AICP reduces the amount of irrelevant context the agent must process.
Adoption depends on developer effort.
For each website implementation, measure:
This matters because a technically superior standard may fail if implementation is too heavy.
The ideal result is that a basic contract can be generated automatically, and developers only need to annotate risk, policy, and confirmation requirements.
Not everything can be measured only with numbers.
The evaluation should also collect qualitative observations:
This is important because the proposal is also about trust and interface clarity.
The expected result is not that AICP wins in every case.
The expected result is more precise:
In other words:
AICP should make the best path better, not eliminate every fallback.
The evaluation should be practical. The goal is not to prove an abstract protocol in isolation, but to test whether explicit interaction contracts improve real agentic workflows.
The main question is simple:
If the website declares its capabilities explicitly, does the agent become cheaper, safer, and more reliable?
If the answer is yes, the case for an agent-native web layer becomes much stronger.
Agent Interaction Contracts are not proposed as a replacement for the existing web. They are proposed as a missing layer. For this reason, it is important to clarify what the proposal does and does not claim.
The goal is not to eliminate browsers, APIs, OpenAPI, OAuth, MCP, or human-facing pages. The goal is to make websites more explicit for agents when agentic interaction is useful.
It is also useful to say what AICP is not trying to do.
AICP does not replace HTTP. It does not replace OpenAPI. It does not define a new authentication protocol. It does not guarantee legal compliance by itself. It does not eliminate browser automation. It does not guarantee that website data is truthful. And it does not determine legal controller or processor roles by itself.
The proposal is narrower and more practical: define a lightweight, website-level interaction contract for agents, built on top of the web that already exists.
APIs are often the best interface for software. They are structured, efficient, and more stable than visual pages. For many agentic workflows, using an API is clearly better than using browser automation.
But “just use APIs” is not enough as a web-scale answer.
Many APIs are:
Also, an API endpoint does not always communicate the meaning of an operation inside a user workflow. An endpoint may technically create a booking, but the agent needs to know whether this is a temporary hold, a purchase, a cancellation, or another high-impact action.
AICP does not compete with APIs. It gives APIs an agent-facing semantic layer.
OpenAPI is very useful for describing HTTP APIs. It can define endpoints, parameters, request bodies, responses, authentication schemes, and schemas.
But an agent interaction contract needs additional information:
OpenAPI describes how to call an API. AICP describes how an agent should interact with a website capability.
These two layers can work together. AICP can reference OpenAPI schemas instead of duplicating them.
MCP is valuable because it standardizes how models connect with tools and external systems. It is especially useful in controlled environments, enterprise workflows, local tools, development environments, databases, and specialized integrations.
But MCP is not necessarily the right universal interface for every public website.
Requiring each website to build and operate a custom MCP server may be too heavy. Many websites already have HTTP routes, schemas, authentication systems, and APIs. For them, publishing a lightweight agent contract may be more natural.
A simple distinction is useful:
| MCP | AICP |
|---|---|
| Tool-centric | Website-centric |
| Good for controlled integrations | Good for public web surfaces |
| Requires a tool server | Can be exposed over normal HTTP |
| Powerful and flexible | Lightweight and discoverable |
| Runtime tool protocol | Website interaction contract |
AICP can also complement MCP. An MCP server could consume AICP contracts. Or a website could expose both: AICP for public agent discovery, MCP for deeper integrations.
llms.txt is important because it recognizes that language models need cleaner access to website information. It is simple, readable, and useful for documentation-heavy sites.
But llms.txt is mostly content-oriented.
It does not define:
AICP is focused on interaction, not only content consumption.
In this sense:
llms.txt helps agents read. AICP helps agents act safely.
Browser agents are necessary. They allow agents to use websites that do not expose APIs, contracts, or structured interfaces. They are a powerful fallback.
But fallback should not become the main architecture.
If an agent needs to buy a flight, cancel a subscription, submit a form, or monitor a price, the best interface should not be a visual page designed for humans. It should be a declared capability with clear inputs, outputs, permissions, risks, and confirmation requirements.
Improving browser agents is useful. Improving the web interface for agents is also necessary.
Both paths can coexist.
For website owners, supporting agents may look risky at first. It may increase traffic, reduce ad impressions, or enable scraping.
But a standard agent interface can also create benefits:
A website that does not expose an agent interface may still be scraped or automated through browsers. AICP gives the website a chance to define a better path.
The choice is not between agent access and no agent access. The real choice may be between uncontrolled agent access and governed agent access.
For agent providers, AICP can reduce:
If many websites expose contracts, agents can spend less time understanding interfaces and more time solving the user task.
This matters especially if token budgets, inference latency, and tool-call costs become strategic constraints.
For users, the main benefits are control and reliability.
AICP can help users understand:
This is important because the “agent for everything” will only work if users can delegate safely.
Users do not want agents that only appear autonomous. They want agents that are useful, controllable, and accountable.
One risk is fragmentation.
If every company creates its own agent manifest format, the web may end up with many incompatible conventions. This would reproduce the same integration problem that AICP tries to solve.
For this reason, the first version should be small, open, and compatible with existing standards.
It should not try to own every layer. It should define only the missing pieces:
A small standard has a better chance of becoming a common standard.
The deeper change is conceptual.
The web has historically exposed pages to humans and APIs to developers. Agents are somewhere in between. They need machine-readable interfaces, but they also operate under user intent, delegation, policy, and real-world consequences.
This makes them different from crawlers and different from normal API clients.
The web needs a way to say:
Here is what I can do for an agent. Here is how to call it. Here is what it means. Here is who may do it. Here is when to ask the user. Here is how to attribute the result.
That is the role of Agent Interaction Contracts.
AICP is not a replacement for the web. It is a way to make the web more explicit.
It does not remove the need for browsers. It reduces unnecessary browsing.
It does not remove the need for APIs. It gives APIs agent-facing meaning.
It does not remove the need for OAuth. It connects authorization to capabilities.
It does not replace MCP. It makes ordinary websites easier to expose to agents.
The proposal is modest in implementation, but ambitious in consequence: it changes the default assumption from agents inferring interfaces to websites declaring them.
Agent Interaction Contracts should be developed as an open, incremental, and web-compatible standard. The objective is not to create a closed protocol controlled by one vendor. The objective is to define a small shared layer that websites, agent runtimes, frameworks, and tool providers can adopt gradually.
A standard for the agent-native web should begin simple, prove value, and then expand.
The first version should be intentionally small.
AICP 0.1 should define:
It should not try to solve every possible use case at the beginning.
A small version is easier to implement, easier to criticize, and easier to improve.
The project should publish a reference specification with:
A suggested structure:
/spec
/0.1
manifest.md
discovery.md
capabilities.md
risk-levels.md
policies.md
security.md
aom.md
/examples
travel.toml
ecommerce.toml
saas-admin.toml
documentation.toml
The specification should be readable by developers, not only by standards experts.
AICP should define explicit media types.
Suggested initial media types:
application/aicp+toml
application/aicp+json
application/aom+json
Where:
application/aicp+toml is the canonical manifest representation;application/aicp+json is an equivalent machine-oriented manifest representation;application/aom+json is the runtime response representation.This gives clients and servers a clear negotiation mechanism.
The standard should use a predictable well-known URI:
/.well-known/agent-interface.toml
and optionally:
/.well-known/agent-interface
The first is explicit and simple. The second allows content negotiation.
If the proposal matures, registration of the well-known URI should be considered through the appropriate standards process.
AICP should be designed to compose with existing standards.
| Existing mechanism | Relationship with AICP |
|---|---|
| HTTP | Substrate |
| Well-known URIs | Discovery |
| OpenAPI | Schema and endpoint references |
| OAuth | Delegated authorization |
| robots.txt | Crawl and access preferences |
| llms.txt | LLM-readable content guidance |
| schema.org | Structured entity metadata |
| MCP | Tool integration |
| JSON Schema | Request and response schemas |
This compatibility is important. If AICP tries to replace all of these, it will fail. If it connects them, it can become useful.
The standard should be accompanied by reference implementations.
Initial targets:
A FastAPI implementation could look like:
@app.post("/agent/flights/search")
@agent_capability(
id="flights.search",
type="query",
risk_level="low",
auth="optional",
)
def search_flights(request: FlightSearchRequest) -> FlightSearchResponse:
...
A CLI tool could validate manifests:
aicp validate ./agent-interface.toml
And inspect a website:
aicp inspect https://example-travel.com
Tooling matters because developers adopt standards when they are easy to test.
AICP can define conformance levels.
| Level | Requirements |
|---|---|
| Level 0 | Static public manifest |
| Level 1 | Valid capabilities with schemas |
| Level 2 | Policies, provenance, and risk levels |
| Level 3 | Auth-aware capabilities and scopes |
| Level 4 | AOM responses and safe action gating |
| Level 5 | Auditability, idempotency, and dynamic contracts |
This allows gradual adoption. A small website may only need Level 1. A travel, ecommerce, or financial service may need Level 4 or 5.
The proposal should start as an open technical report and reference implementation.
A possible sequence:
The first goal should not be perfection. The first goal should be useful feedback from real implementers.
If the proposal gains adoption, several paths are possible:
The right path depends on adoption. It is better to start with working code and real examples than with a premature committee process.
The standard should follow a few governance principles:
This is especially important because the agent ecosystem is competitive. A standard tied too closely to one vendor will be less credible.
AICP adoption should begin where the value is obvious.
Good early domains:
The first demos should show measurable improvements in:
This is how the proposal can move from idea to standard.
The path should be practical:
Start as a small open specification. Prove value with working examples. Build developer tools. Measure improvements. Then standardize the stable parts.
This sequence gives AICP a better chance of becoming a real web convention rather than only a good article.
Agent Interaction Contracts can make the web more explicit for agents, but they do not solve every problem. A credible proposal must be clear about its limitations.
The main limitation is simple: a contract only helps when a website exposes one and when agents respect it.
AICP depends on websites adopting the standard.
If a website does not publish an Agent Interaction Contract, agents must still use other methods: APIs, OpenAPI, llms.txt, structured data, sitemaps, or browser automation.
This means AICP cannot immediately replace existing approaches. It can only become useful through gradual adoption.
The best adoption strategy is therefore not to demand that every website implements everything. The first version must be easy to implement and useful even when only a few capabilities are exposed.
Browser automation will remain necessary.
Many websites will not expose contracts. Some workflows will remain visual. Some legacy systems will not be updated. Some tasks will require interpreting content that has no structured representation.
AICP should reduce unnecessary browser automation, not pretend that it disappears.
The realistic goal is:
Use contracts when available. Use browsing when necessary.
A manifest is not a security boundary.
A website can declare rate limits, usage policies, commercial restrictions, and training preferences. But malicious actors may ignore them.
Enforcement still requires:
AICP gives websites a standard language for expected behavior. It does not magically enforce good behavior.
Many web domains are dynamic:
In these cases, contracts must handle freshness, revalidation, cache limits, and personalization.
Even with AICP, an agent may need to revalidate information before committing to an action. A price returned at 12:00 may be invalid at 12:15.
This is why the Agent Object Model includes freshness metadata and revalidation requirements. But the problem itself does not disappear.
Some websites personalize results based on user history, location, subscription, cookies, or inferred preferences.
This creates difficult questions:
AICP can expose whether a capability is personalized, but it cannot by itself solve the broader social and regulatory questions around profiling.
Example:
[personalization]
enabled = true
user_controls_available = true
explanation_available = true
This may be useful, but it is only a starting point.
Not all websites will want efficient agent access.
Some business models depend on:
Agent-native access may reduce some of these mechanisms. For this reason, adoption will depend on incentives.
AICP should show benefits for website owners, not only for agent providers. These benefits may include better rate control, paid agent access, safer automation, attribution, and reduced scraping of human pages.
There is a risk that many incompatible “agent manifest” standards appear.
If each company defines its own format, the ecosystem may become fragmented. Agents would again need custom logic for every website.
To reduce this risk, AICP should be:
A minimal shared core is more valuable than a large proprietary format.
A website may expose a valid contract and still behave badly.
It may return misleading data, hide important fees, manipulate rankings, or provide unsafe hints to the agent.
AICP can improve transparency, but it cannot guarantee honesty.
Agents still need:
This is especially important in domains with financial, legal, medical, or safety consequences.
Separating data, policies, actions, and hints reduces prompt injection risk, but it does not eliminate it.
Agents may still encounter malicious content in:
AICP should make the trust boundary clearer, but agent runtimes must still defend against prompt injection and untrusted instructions.
Agentic interaction raises legal questions that are outside the scope of the technical protocol.
For example:
AICP can provide auditability and explicit confirmation metadata, but legal interpretation will depend on jurisdiction and use case.
There is also a positive limitation to consider.
Agent-native interfaces should not reduce investment in human accessibility. Making the web better for agents should not become an excuse to neglect screen readers, keyboard navigation, semantic HTML, or accessible design.
The web must remain human-readable and accessible.
The goal is an additional layer, not a replacement for accessible human interfaces.
AICP can expose privacy-relevant metadata, but it cannot guarantee legal compliance by itself.
A website may declare purpose, retention, consent requirements, third-party sharing, or data sensitivity incorrectly. An agent provider may also misuse data after receiving it. For this reason, privacy metadata should be treated as a machine-readable compliance aid, not as proof of compliance.
Real compliance still depends on correct implementation, legal agreements, organizational controls, user rights, enforcement, and auditing.
The main limitations are:
These limitations do not invalidate the proposal. They define its real scope.
Agent Interaction Contracts are not a complete solution for all agentic web problems. They are a practical interface layer that can make many of those problems easier to manage.
The web is changing because its users are changing.
For decades, the dominant user of the web was a human with a browser. This is still true, and it will remain true. But AI agents are becoming a new kind of user: systems that can search, compare, monitor, prepare, and execute workflows on behalf of people and organizations.
The current web is not ready for this in a clean way. Agents often need to behave like humans inside interfaces designed for screens. They inspect HTML, parse DOM structures, process screenshots, click buttons, wait for JavaScript, and recover from UI changes. This works as a fallback, but it is expensive, fragile, and risky.
The problem is not HTTP. HTTP already gives us a strong substrate for resources, methods, headers, representations, caching, and negotiation. The problem is the missing interface layer between human-facing pages and agentic workflows.
This paper has proposed Agent Interaction Contracts: declarative, HTTP-native manifests that allow websites to expose their capabilities to agents in a structured, policy-aware, and auditable way.
The central idea is simple:
Agents should not need to infer a website’s capabilities from visual interfaces when the website can declare them explicitly.
Agent Interaction Contracts describe what agents can read, query, compare, monitor, prepare, or execute. They include schemas, authentication requirements, authorization scopes, rate limits, usage policies, risk levels, provenance, freshness, and confirmation requirements.
Together with the Agent Object Model, they also provide a structured runtime response format that separates data, actions, policies, provenance, warnings, freshness, and optional hints. This separation matters because agentic systems need more than data. They need safe context for action.
The proposal is not a replacement for existing standards. It complements them.
This is the missing layer.
If the web wants to support the “agent for everything”, it cannot rely only on making agents better at using human interfaces. It must also make websites better at exposing machine-readable capabilities, constraints, risks, and policies.
The web does not need to stop being human-readable.
But it must become agent-readable as well.
This appendix provides a complete example of an Agent Interaction Contract using TOML as the canonical manifest format.
aicp_version = "0.1"
min_supported_version = "0.1"
recommended_version = "0.1"
[site]
name = "Example Travel"
origin = "https://example-travel.com"
description = "A travel website exposing agent-native capabilities for flight search, fare monitoring, booking holds, and purchases."
[formats]
canonical = "application/aicp+toml"
json = "application/aicp+json"
runtime_response = "application/aom+json"
[auth]
type = "oauth2"
authorization_url = "https://example-travel.com/oauth/authorize"
token_url = "https://example-travel.com/oauth/token"
available_scopes = [
"flights:read",
"fares:watch",
"bookings:hold",
"bookings:purchase",
"bookings:cancel"
]
[policies]
anonymous_access = true
commercial_use = "requires_auth"
citation_required = true
summarization = "allowed"
training_use = "disallowed"
automated_monitoring = "requires_auth"
[policies.cache]
allowed = true
max_ttl_seconds = 300
[data_processing]
personal_data_processed = true
lawful_basis = "user_consent"
purpose = "travel_search_and_booking"
data_minimization_required = true
retention = "provider_policy"
privacy_policy = "https://example-travel.com/privacy"
user_rights_endpoint = "https://example-travel.com/privacy/rights"
[data_processing.sharing]
third_parties = ["airline_provider", "payment_processor"]
cross_border_transfer = true
transfer_mechanism = "standard_contractual_clauses"
[rate_limits]
anonymous = "20/hour"
authenticated = "1000/hour"
commercial = "contract_required"
[provenance]
required = true
fields = ["source", "retrieved_at", "canonical_url", "license"]
[cache]
max_age_seconds = 3600
stale_while_revalidate_seconds = 86400
[[capabilities]]
id = "flights.search"
type = "query"
description = "Search available flights by origin, destination, dates, passengers, and constraints."
method = "POST"
endpoint = "/agent/flights/search"
risk_level = "low"
auth = "optional"
input_schema = "#/schemas/FlightSearchRequest"
output_schema = "#/schemas/FlightSearchResponse"
cache_ttl_seconds = 60
[[capabilities]]
id = "fares.watch"
type = "monitor"
description = "Create a price watch for a flight search or fare."
method = "POST"
endpoint = "/agent/fares/watch"
risk_level = "low"
auth = "required"
required_scopes = ["fares:watch"]
requires_user_confirmation = false
input_schema = "#/schemas/FareWatchRequest"
output_schema = "#/schemas/FareWatchResponse"
[[capabilities]]
id = "bookings.hold"
type = "prepare_action"
description = "Hold a fare temporarily before purchase."
method = "POST"
endpoint = "/agent/bookings/hold"
risk_level = "medium"
auth = "required"
required_scopes = ["bookings:hold"]
requires_user_confirmation = true
idempotency_required = true
input_schema = "#/schemas/BookingHoldRequest"
output_schema = "#/schemas/BookingHoldResponse"
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
description = "Purchase a held booking."
method = "POST"
endpoint = "/agent/bookings/purchase"
risk_level = "high"
auth = "required"
required_scopes = ["bookings:purchase"]
requires_user_confirmation = true
requires_strong_authentication = true
idempotency_required = true
input_schema = "#/schemas/BookingPurchaseRequest"
output_schema = "#/schemas/BookingPurchaseResponse"
[[capabilities]]
id = "bookings.cancel"
type = "destructive_action"
description = "Cancel an existing booking."
method = "POST"
endpoint = "/agent/bookings/cancel"
risk_level = "critical"
auth = "required"
required_scopes = ["bookings:cancel"]
requires_user_confirmation = true
requires_strong_authentication = true
idempotency_required = true
input_schema = "#/schemas/BookingCancelRequest"
output_schema = "#/schemas/BookingCancelResponse"
This appendix provides a complete example of an Agent Object Model response for a flight search.
{
"data": {
"results": [
{
"id": "fare_123",
"origin": "MAD",
"destination": "NRT",
"departure_time": "2026-07-04T10:20:00+02:00",
"arrival_time": "2026-07-05T08:30:00+09:00",
"airline": "Example Air",
"stops": 1,
"duration_minutes": 1090,
"checked_baggage_included": true,
"price": {
"amount": 682,
"currency": "EUR"
}
}
]
},
"actions": [
{
"id": "fares.watch",
"label": "Watch this fare",
"method": "POST",
"endpoint": "/agent/fares/watch",
"risk_level": "low",
"requires_user_confirmation": false,
"input": {
"fare_id": "fare_123",
"threshold": {
"amount": 700,
"currency": "EUR"
}
}
},
{
"id": "bookings.hold",
"label": "Hold this fare",
"method": "POST",
"endpoint": "/agent/bookings/hold",
"risk_level": "medium",
"requires_user_confirmation": true,
"input": {
"fare_id": "fare_123"
}
}
],
"policies": {
"citation_required": true,
"commercial_use": "requires_auth",
"training_use": "disallowed",
"cache": {
"allowed": true,
"max_ttl_seconds": 300
}
},
"privacy": {
"personal_data_included": false,
"data_categories": ["travel_preferences"],
"data_sensitivity": "personal",
"purpose": "flight_search",
"retention": "session_only",
"downstream_use": {
"summarization": "allowed",
"training": "disallowed",
"third_party_sharing": "disallowed"
}
},
"provenance": {
"source": "Example Travel",
"origin": "https://example-travel.com",
"canonical_url": "https://example-travel.com/flights/result/fare_123",
"retrieved_at": "2026-05-09T12:00:00Z",
"license": "standard_terms"
},
"freshness": {
"valid_until": "2026-05-09T12:15:00Z",
"volatility": "high",
"revalidation_required_before_commit": true
},
"warnings": [
{
"code": "price_may_change",
"severity": "medium",
"message": "The displayed fare is volatile and may change before purchase."
}
],
"agent_hints": {
"recommended_sort": "price_ascending",
"comparison_fields": ["price", "duration", "stops", "baggage"]
}
}
This checklist summarizes minimum security considerations for websites exposing Agent Interaction Contracts.
agent_hints are treated as untrusted.This appendix compares Agent Interaction Contracts with related approaches.
| Dimension | HTML browsing | Scraping | OpenAPI | llms.txt | MCP | AICP |
|---|---|---|---|---|---|---|
| Human-readable | High | Medium | Low | High | Low | Medium |
| Machine-readable | Low | Medium | High | Medium | High | High |
| Website-level discovery | Medium | Low | Low/Medium | High | Low | High |
| Capability semantics | Low | Low | Medium | Low | High | High |
| Action risk levels | Low | Low | Low | Low | Depends on tool | High |
| Human confirmation metadata | Low | Low | Low | Low | Possible | High |
| Usage policies | Low | Low | Low | Medium | Possible | High |
| Runtime response structure | Low | Low | Medium | Low | High | High |
| Token efficiency | Low | Medium | High | Medium | High | High |
| Implementation cost for websites | Existing | Low/Medium | Medium | Low | Medium/High | Low/Medium |
| Suitable for public websites | High | Medium | Medium | High | Medium | High |
| Suitable for high-impact actions | Low | Low | Medium | Medium | High | High |
| Works without site adoption | Yes | Yes | No | No | No | No |
| Safe fallback role | Primary today | Fragile fallback | Good when available | Content fallback | Tool integration | Agent-native path |
The table does not imply that AICP replaces the other approaches. The main idea is that AICP fills a different layer: website-level interaction contracts for agents.
This appendix sketches what a lightweight implementation could look like in a backend framework.
from fastapi import FastAPI
from pydantic import BaseModel
from aicp import AgentInterface, agent_capability
app = FastAPI()
agent_interface = AgentInterface(
app=app,
site_name="Example Travel",
origin="https://example-travel.com",
version="0.1",
)
class FlightSearchRequest(BaseModel):
origin: str
destination: str
departure_start: str
departure_end: str
max_stops: int | None = None
checked_baggage: bool = False
class FlightSearchResponse(BaseModel):
results: list[dict]
@app.get("/.well-known/agent-interface.toml")
def get_agent_interface():
return agent_interface.to_toml()
@app.post("/agent/flights/search")
@agent_capability(
id="flights.search",
type="query",
description="Search available flights by origin, destination, dates, passengers, and constraints.",
risk_level="low",
auth="optional",
input_schema=FlightSearchRequest,
output_schema=FlightSearchResponse,
)
def search_flights(request: FlightSearchRequest) -> FlightSearchResponse:
return FlightSearchResponse(results=[])
@app.post("/agent/bookings/purchase")
@agent_capability(
id="bookings.purchase",
type="commit_action",
description="Purchase a held booking.",
risk_level="high",
auth="required",
required_scopes=["bookings:purchase"],
requires_user_confirmation=True,
requires_strong_authentication=True,
idempotency_required=True,
)
def purchase_booking(request: dict):
...
The framework could generate:
aicp_version = "0.1"
[site]
name = "Example Travel"
origin = "https://example-travel.com"
[[capabilities]]
id = "flights.search"
type = "query"
description = "Search available flights by origin, destination, dates, passengers, and constraints."
method = "POST"
endpoint = "/agent/flights/search"
risk_level = "low"
auth = "optional"
[[capabilities]]
id = "bookings.purchase"
type = "commit_action"
description = "Purchase a held booking."
method = "POST"
endpoint = "/agent/bookings/purchase"
risk_level = "high"
auth = "required"
required_scopes = ["bookings:purchase"]
requires_user_confirmation = true
requires_strong_authentication = true
idempotency_required = true
A simple CLI could help developers validate contracts:
aicp validate ./.well-known/agent-interface.toml
Possible output:
AICP manifest valid.
Capabilities:
- flights.search: query, low risk
- bookings.purchase: commit_action, high risk, confirmation required
Warnings:
- bookings.purchase has no freshness revalidation rule.
The same CLI could inspect a website:
aicp inspect https://example-travel.com
Output:
Found Agent Interaction Contract:
https://example-travel.com/.well-known/agent-interface.toml
AICP version: 0.1
Capabilities: 4
High-risk actions: 1
Critical actions: 0
Authentication: OAuth2
Runtime response format: application/aom+json
The goal of the reference implementation is not to be complete from the beginning. The goal is to make the idea easy to try.
If you reference this work, please cite it as:
```bibtex
@misc{SergioMunozGamarra2026agentnativeweb, title = {The Agent-Native Web: Declarative Interaction Contracts for AI Agents over HTTP}, author = {Sergio Muñoz Gamarra}, year = {2026}, url = {https://sergiomunozgamarra.github.io/iacp}, note = {Version 0.1} }