Trying Claude for E2E Testing on an Airbnb Clone

Trying Claude for E2E Testing on an Airbnb Clone | by Ankur Tyagi | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Ankur Tyagi

15 min read· 2 days ago

Listen

TLDR: I built Bunkly, a test-friendly Airbnb clone, and ran Claude Code through every stage of E2E testing: test planning, spec generation, parallel runs, GitHub Actions, and updating tests after a UI redesign. Claude worked best as a contributor to my process, not a replacement for the decisions I still had to make. You can use its knowledge of Playwright APIs and common testing patterns, and it will scaffold setup and auth projects and fill out happy-path flows faster than writing them by hand. But it needs you to guide it: app routes, seed data, environment variables, and workflow boundaries are things Claude cannot infer on its own. It will produce specs that look complete but silently pass for the wrong reasons, especially when product changes shift behavior under the hood. Treat Claude as a way to accelerate a first draft, then review its output for correctness before you trust it in production.

You hear all the time about teams struggling to create and maintain E2E tests that keep pace with the speed at which they’re shipping. They might initially invest time in creating an initial suite only to watch the suite turn flaky, slow, or get ignored when failures pile up [1]. I’ve heard people talk about using Claude Code for QA and wanted to see if Claude could help with the planning side of QA work: test planning, generating specs, running tests locally, setting up GitHub Actions, and updating tests when the UI changed. As part of my test, I built a production-ready Airbnb-style hotel app called Bunkly, complete with search, checkout, loyalty, and messaging. I decided to start with a clean slate, an app with no Playwright tests yet. I have attached the full Claude Code transcript [2] for you to read alongside the blog. Press enter or click to view image in full size

The Bunkly codebase was designed to be deliberately test-friendly. Interactive elements carry `data-testid` attributes, seed data is deterministic (fixed slugs and known passwords in `CLAUDE.md`), and `ARCHITECTURE.md` explains how Server Components and server actions split responsibilities. You’d think that would make things straightforward. Instead, I hit selectors that weren’t in the code, odd feature priorities (wishlists ranked above booking), and CI jobs that skipped hidden directories. This post covers what happened at each step of trying Claude Code for QA testing, what worked, what broke, and what I still had to verify in the code, seed data, and CI logs. Set up Bunkly locally The first step was to get Bunkly running locally. I cloned the repository and seeded the database with the following commands: git clone https://github.com/rishi-raj-jain/bunkly cd bunkly npm install

cp .env.example .env # set DATABASE_URL

npm run db:migrate # or db:push in dev npm run db:seed

npm run dev # http://localhost:3000The seed step creates about ten to fifteen properties across five cities, rolls inventory forward for a year, and adds users with plaintext passwords for test automation. Without that seed data, flows like logging in as a user named Sarah, viewing upcoming bookings, or testing loyalty enrollment would have nothing to assert against. To enable browser-aware changes in Claude Code, I also set up the Playwright MCP so threads could refer to real navigation and DOM behavior along with file changes. Claude for QA Test Planning Before automating the tests via code, I asked Claude for a test plan keyed to Bunkly’s main workflows, e.g., discover properties, complete a booking end-to-end, manage profile and payment methods, notifications, message threads, loyalty (where seed allows), and secondary flows like wishlists and price alerts. Here is the prompt I used: Produce a prioritized E2E test matrix in docs/cc-test-plan.md with P0/P1/P2 cases.Claude’s first attempt at a test plan was thorough on the surface. It covered the major features and some lesser-used ones, such as price alerts, wishlists, reviews after check-out, and flows for editing bookings. The plan was laid out as a large table with P0, P1, and P2 priorities and tried to balance common paths with edge cases. Taking stock of the app’s features was genuinely helpful. It forced me to consider corners of the product I might have skipped otherwise. Here are a few prompts I sent after the initial test plan was generated (source): can you move the tests from P2 to PQ that makes sense, we are a booking company so we need the ability to make sure that search is the top prioritybut how is wishlist a P1?why adding payment method is P1?why did you move payment method to P2? It was in P1 and it should be a P0 instead since we need to be able to process money!why in the world is forgot password a P2?Several important...

Trying Claude for E2E Testing on an Airbnb Clone

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level