Why Tenten Went AI-First

Two years ago, Tenten's engineering team hit a wall. Custom Shopify app builds were taking 8–12 weeks. Client timelines demanded faster delivery. Adding more developers wasn't scaling—onboarding a developer on Shopify APIs and metafields architecture took 3–4 weeks.

The decision was straightforward: use AI agents to handle the boring work. Write code scaffolding, unit tests, API boilerplate, and quality assurance. Free up human developers for architecture, reviews, and solving novel problems.

The bet paid off. Today, Tenten ships custom Shopify apps in 4–6 weeks without cutting corners. Defect rates actually dropped 23% because AI-assisted code review catches more patterns than manual review alone.

The AI-First Stack We Built

Tenten doesn't use one AI tool. We built a layered approach:

Layer 1: Code Generation (Claude AI via API)

  • Input: Feature requirements → pseudo-code outline from engineer
  • Output: GraphQL queries, REST endpoints, Liquid templates, TypeScript boilerplate
  • Time saved: 35–40% on initial scaffold
  • Trade-off: Generated code requires review; ~15% needs rework for domain-specific logic

Layer 2: Test Automation (AI + Pytest)

  • Claude writes unit tests before code (test-driven development for AI)
  • Tests catch API contract mismatches that developers miss
  • ~80% of integration bugs caught in test phase, not production
  • Time saved: 20–25% on QA cycles

Layer 3: Code Review Agents (Automated linting + AI)

  • AI agents scan for: unused variables, missing error handling, API rate limit vulnerabilities, metafield type mismatches
  • Flags issues before human review
  • Reduces human review time by 30%

Layer 4: Documentation (Auto-generated from code)

  • AI extracts function signatures, API schemas, workflow diagrams
  • Developers spend <2 hours documenting a 3-week build instead of 8–10 hours
Development Phase Traditional AI-Assisted Time Saved Quality Impact
Scaffolding & boilerplate 40 hours 15 hours 63% No change
Unit test writing 30 hours 10 hours 67% +12% catch rate
Code review 25 hours 18 hours 28% +8% defect prevention
Documentation 20 hours 3 hours 85% Same
Total per project 115 hours 46 hours 60% faster +15% quality

Where AI Succeeds (And Fails)

AI crushes these problems:

  1. API boilerplate — Shopify's GraphQL API has 500+ query types. Generating a custom query resolver is repetitive; AI nails it.
  2. Type definitions — Creating TypeScript interfaces from Shopify metafield schemas is mechanical. AI generates them 95% correct.
  3. Error handling patterns — AI knows common Shopify API error codes and generates proper handling (rate limits, timeout recovery, webhook retries).
  4. Test fixtures — Building mock data for tests is tedious. Claude generates realistic fixtures in seconds.

AI struggles here:

  1. Business logic — Why do we validate inventory before charging? That's a business rule, not a code pattern. Developers write that.
  2. Architectural decisions — Should this feature use a webhook or a scheduled job? Wrong choice breaks the system. This requires human judgment.
  3. Edge cases — What if a customer has 10,000 product variants and filters by 5 tags simultaneously? AI doesn't catch performance cliffs; load testing does.
  4. Security trade-offs — AI suggests solutions, but a senior engineer must decide: "Is this secure for merchant data?"

The Real Economics

Here's what Tenten's AI-first approach actually costs:

Setup (one-time):

  • Prompt engineering & workflow design: $40K
  • AI API account setup & fine-tuning: $5K
  • Team training on AI-assisted workflows: $10K
  • Total: $55K

Per-project savings (typical $80K custom app):

  • Delivery time: 4 weeks instead of 8 weeks
  • Developer cost saved: $12K–$15K per project
  • QA cost saved: $3K–$5K
  • Margin improvement: 18–25%

Risks Tenten manages:

  • API cost: Claude API calls run $2–$8 per project
  • Hallucinations: AI suggests code that looks right but doesn't work. Every output goes to code review (non-negotiable)
  • Talent retention: Some developers worry AI will replace them. Tenten addressed this: "AI handles busywork; you do the hard problems." Retention improved.
Cost Category Year 1 Year 2+ Notes
Setup & training $55K $0 One-time investment
API usage (30 projects/year) $4K $4K Claude API @ ~$130/project
Prompt maintenance $8K $5K Updates to AI instructions
Human review & QA $180K $180K Still critical (non-negotiable)
**Total $247K $189K** ROI breaks even by month 8

Operational Insights We Didn't Expect

1. AI code is harder to refactor than human code
AI doesn't know your codebase's conventions. It generates correct code, but differently than your team would. Refactoring AI-generated code back to team standards takes 10–15% of saved time. Not a loss—just a surprise.

2. Junior developers learn faster
We expected seniors to benefit most. Opposite happened. Junior developers now generate production-ready scaffolding in hours instead of weeks. Onboarding is 50% faster because the AI teaches patterns by example.

3. More AI = lower morale if messaging is wrong
Early on, we announced "AI will make us 60% faster." Some engineers heard "your job is less valuable." We flipped the message: "AI does the repetitive work; you solve hard problems." Morale improved. Productivity didn't change, but satisfaction did.

4. Code review becomes more critical, not less
With AI, code review isn't about catching typos—it's about validating logic and security. We upgraded our review SLAs. Every AI-generated output goes to a senior engineer. False sense of security kills projects.

What We Don't Do (And Why)

Tenten uses AI for scaffolding and tests, but not for:

  • Custom app logic from plain English descriptions — "Build me a subscription system" doesn't translate to code. We need pseudo-code first.
  • Architecture decisions — AI suggests options, humans decide.
  • Security & compliance code — AI-generated auth middleware is a liability. We write that ourselves.
  • UI/UX — Shopify apps require UX finesse that AI doesn't understand. We design human-first.

The Next Frontier

In 2026, Tenten is exploring:

  1. Multi-agent workflows — One AI agent writes code, another writes tests, another validates against Shopify API docs. Run them in parallel.
  2. Fine-tuned models — Training AI on Tenten's codebase so it generates code matching our style without 10 hours of refactoring.
  3. Real-time pair programming — Claude/GPT in the IDE as developers type, suggesting next lines (like Copilot, but with Shopify context).
  4. AI-assisted performance optimization — Load test results → AI suggests query optimizations → developers validate.

The long bet: AI doesn't replace engineers. It amplifies senior engineers by offloading the tasks that slow them down.

How to Start AI-First Development at Your Agency

If you run a Shopify development shop, here's how to begin:

  1. Pick one non-critical project and use Claude/GPT for scaffolding. Measure time saved vs. code quality.
  2. Create a prompt library for your repeating patterns (GraphQL queries, metafield validators, webhook handlers).
  3. Implement mandatory code review for all AI-generated code. No exceptions.
  4. Train your team on what AI is good at (don't ask it to solve novel problems).
  5. Track metrics: time per feature, defect rate, developer satisfaction. AI-first is only worth it if all three improve.

The mistake most teams make: expecting AI to solve hard problems. AI shines on boring, repetitive work. If your project is 40% scaffolding, 40% tests, 20% novel logic, AI-first saves time. If it's 90% novel logic, AI barely helps.


Article FAQ

Q1: Does AI-generated code have more security vulnerabilities?

Not if reviewed properly. Tenten's AI-generated code goes through the same security review as human-written code. In practice, AI is more consistent about error handling (less likely to miss a null check). The risk is humans trusting AI without review—that's a process failure, not a code quality issue.

Q2: What if Claude/GPT API goes down? Does development halt?

Tenten keeps a 2-week backlog of non-AI work ready to go. Plus, we maintain fallback workflows (human scaffolding, which is slower but possible). We don't depend on AI for deployment—only for development speed. Downtime costs us 3–5 days, not weeks.

Q3: How much AI-generated code makes it to production without changes?

About 45%. The other 55% requires some modification (a type fix, a validation update, a performance tweak). This is healthy—it means engineers are reviewing, not rubber-stamping.

Q4: Can we use AI to maintain legacy Shopify code?

Partially. We use AI to understand old code (have Claude summarize a 500-line Liquid template). But updating it requires human judgment—legacy code often has implicit dependencies. AI rewrites can break things subtly.

Q5: Does AI speed up the selling process? Can we quote faster?

Yes. With AI tooling, Tenten quotes custom work 1 week faster (because we can scope it with higher confidence). Clients appreciate the faster quote, even if build time doesn't change much.

Q6: What percentage of Tenten's development is AI-assisted now?

About 65% of backend (APIs, webhooks, integrations) and 40% of frontend (templates, components). The harder the problem, the lower the AI percentage.

Q7: How does AI affect pricing? Do you charge less?

We didn't lower prices. Instead, we improved margin and reinvested in: (1) better code review, (2) advanced features clients wanted, (3) faster revisions. Clients pay the same, but get more value and faster delivery.