A complete product design process — from a one-paragraph idea to pixel-ready pages — conducted entirely as a conversation between a human and an AI design partner.
This is a documentation of a complete product design process for Unreal Expeditions — a website for a Discord community that runs collaborative research expeditions into Unreal Engine source code.
The entire process was conducted as a conversation between a human (Nick) and an AI design partner (Claude), using a custom pipeline that enforced structured stages: Discovery, Information Architecture, Style Direction, Component Generation, and Page Assembly.
What makes this worth reading isn't the output — it's the process. The pushback moments, the variant debates, the scope negotiations, and ultimately the meta-realization that good decision-making doesn't automatically produce good output without a self-evaluation loop.
Unreal Expeditions is a public-facing website with two jobs:
Three routes: Landing page (/), Archive (/expeditions), and Expedition detail (/expeditions/:slug).
A Python orchestrator + CLAUDE.md system prompt that enforces structured stages before any pixels get pushed
This conversation didn't happen freeform. It was guided by a pipeline — a set of Python scripts and a system prompt that enforce a specific order of operations. The AI can't jump to components without finishing discovery. It can't assemble pages without approved components.
The pipeline was itself designed in a separate Claude session before this project began. What you're reading is the first real use of it.
The orchestrator manages stage transitions. Each stage has its own module that:
project.json to track completionThe CLAUDE.md system prompt shapes the AI's behavior — defining it as an opinionated design partner (not an order-taker), specifying conversation phases, and providing JSON schemas for all output artifacts.
The eval scripts were built to evaluate visual output quality — detecting layout issues, accessibility problems, and design token violations. These informed the self-evaluation protocol that emerged in Chapter 6.
Master controller — routes between stages, loads context, enforces ordering
View source →Stage 1 — brief, features, flows. Forces problem understanding before solutions.
View source →The system prompt that defines the AI's role, conversation protocol, and output schemas
View source →Visual quality evaluation — adversarial testing, layout detection, two-pass review
Browse →Cold start to design brief — problem understanding, persona definition, feature prioritization, flow mapping
The conversation started with a product pitch. The AI's job was to interrogate assumptions, force prioritization, and produce a structured brief before any visual work began.
Sitemap, screen extraction, responsive strategy, wireframe generation
With the brief locked, the next stage extracted concrete screens from the approved flows, defined the sitemap, and established responsive strategies for each page.
/, /expeditions, /expeditions/:slugToken exploration, 4 candidates, user ranking, and the birth of "Wayfinder Evolved"
Four style directions were generated as complete design token sets. The user ranked them, provided specific feedback, and a fifth hybrid option was created from the best elements.
The user preferred Wayfinder's palette but wanted refinements. This produced Wayfinder Evolved — the final token set that powers everything from here forward.
19 components, multiple variant rounds, side-by-side comparisons at both breakpoints
Each key component went through variant generation → side-by-side preview → user selection → refinement. The previews were standalone HTML files showing components at both 1280px (desktop) and 375px (mobile).
Composing 19 components into 3 complete pages at desktop and mobile breakpoints
All three pages were assembled simultaneously, each rendered as a side-by-side desktop/mobile preview. Real sample content (Replication Graph expedition data) was used throughout.
The meta-lesson: good decision processes don't automatically produce good output
When the user reviewed the assembled pages, they found: missing filter chips, an overflowing progress bar, inadequate code block padding on mobile, and spacing inconsistencies. All issues that should have been caught before presenting.
"To be honest, I feel like all this work we did to make sure the screens are evaluated by Claude with special design tools and principles applied didn't even happen? The process of IA, style decisions, components and variants, they all went really well and did a good job of focusing what we were building. But we're still severely lacking in the ability to self evaluate these generations."
— Nick, after reviewing page assemblies
After this feedback, a structured self-audit loop was established: generate → audit against checklist → fix critical issues (max 3 passes) → present with evaluation notes.
| Check | Result | Notes |
|---|---|---|
| Completeness | PASS | All 19 registry components present on correct pages |
| Overflow | PASS | All layouts fit containers (after fixing progress bar) |
| Spacing | PASS | Consistent rhythm using token scale |
| Mobile | CONCERN | 3 touch target groups below 44px minimum |
| IA cross-reference | PASS | Routes, strategy, global elements all match spec |
The upstream stages (discovery, IA, style, component variants) did their job — scoping, prioritizing, constraining. But the generation step had no quality gate. Output was produced and handed directly to the user as the only reviewer.
The fix wasn't complicated: add a self-audit step between generation and presentation. The hard part was recognizing that the problem existed — it took a user calling it out for the gap to become visible.
Every file produced during the design process