We Built a Shopify Storefront AI Agent With Claude Opus 4.6. Here's What B2B Agency Commerce Looks Like When Agents Negotiate With Agents.
Shopify's Storefront AI agent framework just turned every Hydrogen store into an AI-native commerce endpoint. We recently shipped one for a client using Claude Opus 4.6 as the reasoning engine, and the experience surfaced something I wasn't expecting: a concrete preview of what B2B agency-to-agency commerce looks like when AI agents handle the negotiation. Shopify and Google's Universal Commerce Protocol (UCP), launched in early 2026, lets AI agents query a merchant's inventory, create checkout sessions, and complete transactions through structured APIs. Claude Opus 4.6's 1M token context window and agent teams feature mean a single orchestrator can run product search, cart management, and checkout completion in one session without losing context. As of April 2026, UCP has backing from over 20 retailers and platforms including Visa, Mastercard, Stripe, PayPal, Target, Walmart, and Best Buy.
- Shopify AI Toolkit
- AI-Native, Developer-Ready: Unpacking the Winter '26 Edition
- Shopify AI + MCP: CTO Must-Read on Agentic Commerce Tech Stacks
This isn't a think piece about AI transforming commerce. We're going to walk through the technical decisions we made, the business model shift we're seeing, and why B2B agency-to-agency commerce is closer than most people realize.
The Stack: Claude Opus 4.6 + Shopify Storefront MCP
Why Claude Opus 4.6
We tested three frontier models for the agent backbone. Here's why Opus 4.6 won:
| Evaluation criteria | Claude Opus 4.6 | GPT-4o | Gemini 2.5 Pro |
|---|---|---|---|
| Context window | 1M tokens | 128K tokens | 1M tokens |
| Max output | 128K tokens | 16K tokens | 65K tokens |
| Tool-calling reliability | Top score on Terminal-Bench 2.0 | High | High |
| Multi-step agentic planning | Native agent teams | Requires external framework | Native but lower stability |
| Long-task persistence | Context compaction built-in | Prone to drift | Moderate |
| API pricing (per million tokens) | $5 input / $25 output | $2.50 input / $10 output | Varies by tier |
Opus 4.6 costs more per token. That matters less than you'd think in agentic commerce, because the alternative to a reliable agent isn't a cheaper agent — it's a failed checkout. When your agent needs to hold product search results, inventory checks, discount calculations, shipping options, and payment authorization in a single session, context reliability is the bottleneck. The 1M token window plus context compaction (automatic summarization when tokens approach the limit) keeps long sessions intact.
Anthropic shipped Opus 4.6 on February 5, 2026, alongside agent teams. One lead agent coordinates strategy. Multiple teammate agents execute in parallel, each with independent context windows. In our build, the lead handles user intent and checkout logic. Two teammates handle Catalog MCP product search and Storefront MCP policy queries respectively.
Here's a data point that puts Opus 4.6's agentic capability in perspective: Anthropic researcher Nicholas Carlini reported that 16 Claude Opus 4.6 agents wrote a C compiler in Rust from scratch, capable of compiling the Linux kernel. The experiment cost roughly $20,000. Separately, Opus 4.6 autonomously closed 13 issues across 6 repositories for a 50-person organization in a single day.
Shopify's Three-Layer MCP Architecture
Shopify's Winter '26 Edition formalized the agentic commerce stack around three MCP layers:
| MCP layer | Function | Use case |
|---|---|---|
| Catalog MCP | Cross-platform product discovery across billions of Shopify products | Agent finds products for users |
| Storefront MCP | Single-merchant product data, return policies, FAQs, brand voice | Agent understands a specific store's rules |
| Checkout MCP | Create checkout sessions, update carts, complete transactions | Agent completes purchase on user's behalf |

After Hydrogen 2026.1.4, Storefront MCP is on by default. Every Hydrogen storefront automatically exposes an MCP endpoint at /api/mcp when proxyStandardRoutes is enabled (the default). That means every Hydrogen store is, by default, a commerce endpoint that AI agents can interact with programmatically.
Liquid theme stores can be discovered through Shopify Catalog, but they lack the structured API access that makes Storefront MCP powerful. The difference: Hydrogen stores are AI-native commerce endpoints. Liquid stores are AI-discoverable web pages.
What We Learned Building It
Data quality mattered more than model selection. We spent roughly 40% of the project on data cleanup — far more than on agent logic. Inconsistent SKUs, stale inventory counts, non-standardized attribute names (some products used "Color," others "Colour," others "Shade" for the same field) all broke agent behavior. AI agents are literal. They execute against the spec you give them.
UCP's capability negotiation is the key design insight. Not every merchant supports every feature. UCP includes a handshake mechanism: merchants declare which capabilities they support (checkout, order management, discount stacking, etc.), and the agent reads the manifest to determine what's available. One agent can serve merchants with different capability levels without custom logic per store.
Security boundaries need to be set on day one. An agent that can check out on behalf of a user can spend their money. UCP's Agent Payments Protocol (AP2) uses cryptographic signatures to prove user authorization. But agent-side permission controls (can it auto-apply discounts? modify shipping addresses? cancel orders?) need to be locked at the architecture level, not managed through prompt engineering.
B2B Agency-to-Agency Commerce: What We Saw
This is the part that keeps me up at night.
While building the Storefront AI agent, we realized something: if an agent can transact on behalf of a consumer with a merchant, an agent can also transact on behalf of one agency with another agency's agent.
How B2B Agency Services Work Today
- Client submits a request via email, phone, or form
- Sales rep interprets the need, manually searches for the right service or partner
- Internal team or external partner produces a quote
- Back-and-forth negotiation on scope and timeline
- Contract, execute, deliver
Every step is human-to-human. The efficiency bottleneck is communication cost, not technical capability.

How UCP + A2A Changes This
UCP's transport layer natively supports Agent-to-Agent (A2A) communication. Two agents can exchange structured UCP data directly, bypassing traditional HTTP request/response patterns.
Picture this: a brand's AI agent needs to find an agency that can migrate their Shopify Plus store to Hydrogen within three weeks, with local payment gateway integration experience. That agent could:
- Search a structured capability catalog of qualified agencies
- Communicate directly with candidate agencies' agents via A2A protocol
- The agency's agent checks internal resource availability and responds with a preliminary quote
- Both agents negotiate scope and timeline within predefined rules
- A draft SOW goes to human decision-makers on both sides for final approval
Humans only need to make a yes/no decision at the end. Steps one through four are agent-to-agent.
Why This Is Closer Than You Think
Three conditions are already met:
The protocol layer is mature. UCP has endorsements from Google, Shopify, Visa, Mastercard, Stripe, PayPal, Walmart, Target, Best Buy, Etsy, Wayfair, BigCommerce, Adobe Commerce, and Salesforce as of April 2026. A2A, MCP, and AP2 all have production-ready implementations.
AI reasoning is capable enough. Claude Opus 4.6 sustained a 14-hour agentic task. It autonomously managed a 50-person org's issue triage across 6 repos. Sixteen Opus 4.6 instances wrote a working C compiler from scratch. This isn't "almost good enough." This is "trustworthy for complex decisions."
The market is ready. Forrester projects that by 2028, nearly 90% of B2B purchases will involve AI agents. One-third of B2B payment workflows will use AI agents by end of 2026. Eighty percent of B2B sales interactions already happen digitally, per Gartner. Gartner also predicts 40% of enterprise applications will integrate task-specific AI agents by end of 2026, up from under 5% in 2025. And 6sense's 2025 Buyer Experience Report found that 94% of B2B buyers already use LLMs during their buying journey.
What This Means for Agencies
If you run a B2B services agency, here's what you need to think about:
Your capabilities need to be structured data. "We're great at Shopify" won't cut it. An agent needs to know: how many certified Shopify Plus developers you have, how many Hydrogen projects you completed in the last 12 months, your average delivery timeline, which payment gateway integrations you support. This information needs to exist in an agent-readable format.
Your quoting process needs to be at least semi-automated. When a prospect's agent sends an inquiry at 3 AM, your agent needs to respond with a preliminary quote based on preset rules. Full automation isn't necessary for complex projects, but the initial screening and quoting can't only work during business hours.
Trust mechanisms need redesigning. Human sales reps build trust through relationships and reputation. In an agent-to-agent world, trust comes from verifiable records: actual vs. promised delivery timelines, client NPS scores, code quality metrics, SLA achievement rates. These need third-party verifiable mechanisms.
Under the Hood: UCP's Layered Architecture
UCP borrowed TCP/IP's layered design philosophy, separating commerce transactions into independently evolving layers:
| Layer | Function | TCP/IP analogy |
|---|---|---|
| Shopping Service (base) | Core transaction primitives: checkout session, line items, totals, status | Transport layer |
| Capabilities | Independently versioned functional modules: Checkout, Orders, Catalog | Application protocols (HTTP, FTP) |
| Extensions | Domain-specific schemas: loyalty points, subscription billing, custom discount rules | Application extensions (HTTP/2 features) |
The transport layer supports four protocols: REST (HTTPS+JSON, universal compatibility), MCP (LLM agents invoke UCP tools directly), A2A (direct agent-to-agent communication), and AP2 (cryptographic authorization for autonomous payments).
Merchants don't need to implement every layer. A small T-shirt shop might only need Catalog + Checkout. A B2B wholesaler might need Catalog + Checkout + Orders + a custom net-30 payment terms extension. The agent reads the merchant's UCP manifest to determine what's possible.
Shopify VP of Product Vanessa Lee noted that the company spent over 20 years understanding global commerce complexity, and UCP was designed to model that complexity rather than simplify it. Google's VP/GM of Merchant Shopping Ashish Gupta said UCP provides a shared framework across the ecosystem for agents, merchants, and payment providers to communicate consistently.

Market Data Snapshot: Agentic Commerce by the Numbers
| Metric | Figure | Source |
|---|---|---|
| Shopify 2025 BFCM global sales | $14.6 billion over four days | Shopify Winter '26 Edition |
| BFCM peak sales per minute | $5.1 million | Shopify Winter '26 Edition |
| B2B sales interactions already digital | 80% | Gartner |
| B2B purchases involving AI agents by 2028 | ~90% | Forrester |
| B2B payment workflows using AI agents by end of 2026 | One-third | Forrester |
| Companies and platforms endorsing UCP | 20+ | Shopify/Google announcement |
| Sales cycle acceleration with AI agents | Up to 36% faster (64 → 41 days) | Peak Sales Recruiting |
| Enterprise apps integrating AI agents by end of 2026 | 40% (up from <5% in 2025) | Gartner |
| B2B buyers using LLMs during purchase journey | 94% | 6sense 2025 Buyer Experience Report |
Five Things We Learned From the Build
1. Data governance ate 40% of the timeline. We budgeted 20%. Product descriptions were inconsistent, inventory sync had latency, metafield naming was chaotic. AI agents execute against specs. Data quality equals agent quality.
2. Storefront MCP's Knowledge Base is severely underrated. Shopify's Knowledge Base App lets merchants manage return policies, FAQs, and brand voice. The agent uses this data to answer customer questions. Get it right and agent responses beat most human customer service reps. Get it wrong and the agent sounds nothing like your brand.
3. Claude Opus 4.6's adaptive thinking saves significant tokens in checkout flows. Simple add-to-cart operations don't need deep reasoning. Complex discount stacking and shipping calculations do. Adaptive thinking lets the model decide when to think harder without manual thinking budget configuration per step.
4. Human handoff points were the hardest design problem. UCP defines a checkout status called requires_human for transactions that need human involvement (regulatory requirements, merchant policies, or agent capability limits). Designing when to trigger requires_human and what the user experience looks like after the handoff was harder than writing the agent logic itself.
5. Agent-to-agent testing is fundamentally different. Traditional API tests have fixed request/response patterns. Agent-to-agent interactions involve dynamic negotiation — the same starting conditions can produce different conversation paths. We shifted to outcome-based testing (did the transaction complete? was the amount correct? was it within SLA?) rather than path-based testing.
What exactly is a Shopify Storefront AI Agent?
A Storefront AI Agent is a program that communicates directly with a Shopify store through Storefront MCP. It can search products, query policies, manage carts, and guide checkout — all through structured APIs, not simulated browser clicks. Since Hydrogen 2026.1.4, every Hydrogen storefront automatically exposes a Storefront MCP endpoint at /api/mcp.
What's the difference between UCP and MCP?
UCP (Universal Commerce Protocol) is the open standard defining how AI agents transact with merchants, covering the full commerce lifecycle. MCP (Model Context Protocol) is one of UCP's transport bindings that lets LLM-based agents invoke UCP tools directly. UCP defines the "what." MCP is one of the "how" options, alongside REST, A2A, and AP2.
Can non-Shopify merchants use UCP?
Yes. UCP is an open standard with endorsements from BigCommerce, Adobe Commerce, Salesforce, and others. Shopify's Agentic Storefronts currently offer the most complete native implementation — toggle it on in your admin. Non-Shopify merchants need to implement the UCP Shopping Service and relevant capabilities themselves. Google provides a Python reference implementation.
How much does it cost to build a Storefront AI Agent?
For a mid-size brand already running a Hydrogen storefront, expect 6–10 weeks from planning to launch. Data cleanup takes the largest share. Claude Opus 4.6 API costs depend on transaction volume. At roughly 1,000 agent interactions per day, monthly API spend runs $500–$2,000 depending on token usage per interaction.
When will agent-to-agent B2B commerce go mainstream?
The protocols are ready (UCP's A2A transport). The AI capability is ready (Opus 4.6 can sustain 14+ hour agentic tasks). The bottleneck is business trust infrastructure. In B2C, an agent buying a $50 product on someone's behalf carries manageable risk. In B2B, an agent committing to a $200,000 service contract carries different legal liability, compliance requirements, and financial exposure. Realistic timeline: 2026–2027 for semi-automation (agents handle screening and preliminary quotes, humans make final decisions), 2028–2029 for high automation of routine procurement.
Sources
- Shopify — Universal Commerce Protocol (UCP) Architecture
- Shopify — Agentic Commerce at Scale Announcement
- Google Developers Blog — Under the Hood: Universal Commerce Protocol
- Shopify — Winter '26 Edition Developer Platform
Author Insight
The biggest takeaway from building this Storefront AI agent wasn't the technical integration — Claude Opus 4.6 and Shopify MCP worked more smoothly together than I expected. It was watching the business model implications unfold in real time.
When agents can transact on behalf of people, "customer relationships" get redefined. Part of an agency's relationship with clients will shift from interpersonal dynamics to machine-readable performance records: how fast did your agent respond, how accurate were the preliminary quotes, how many transactions completed without requiring human escalation. I'm not saying human relationships stop mattering — they don't. But there's now a data layer underneath them that didn't exist before.
What genuinely unsettles me is the pace. UCP went from announcement to 20+ enterprise endorsements in months. Hydrogen went from no Storefront MCP support to default-on in a single version bump. Agent teams went from concept to production-ready the day Opus 4.6 shipped. Every piece is accelerating simultaneously, and most agencies are still running their sales process through email threads and PDF proposals.
If your B2B service has standardized pricing and delivery workflows, start structuring your capability catalog now. You don't need full automation tomorrow. But "discoverable by agents" is the minimum bar.
Tenten was among the first teams in Asia to ship a Shopify Storefront AI agent in production using Claude Code and Claude Opus 4.6. If you're evaluating an agentic commerce strategy or figuring out how to connect your Shopify store to the UCP ecosystem, schedule a consultation with our team.