Why AI-generated UI looks generic, and what fixes it

You open Claude Design at 11pm. Or v0. Or Lovable. You type "build me a settings page for a fintech dashboard." Thirty seconds later, a working prototype renders.

You stare at it. Sigh. Send the screenshot to your designer with the caption fine, but.

This post is about that but. Why every untouched output from every AI design tool in 2026 looks like the same product. Why prompting harder doesn't fix it. And the five specific things that do.

The look has a name

You can describe the default AI-generated UI in your sleep. Soft gradient hero. A floating 3D character or a low-poly illustration. A row of "trusted by" logos in greyscale. Three glassmorphism cards. Inter throughout. Neutral palette with an accent that's almost always somewhere in the blue-to-purple band. Buttons rounded to 8px. Cards rounded to 12px. A "Save" button living somewhere in the bottom right.

It's competent. It's hollow. It's interchangeable. One designer described it as predictable purple-on-white, technically accomplished and devoid of personality.

The shorthand most people reach for is shadcn slop, but that's unfair to shadcn. shadcn is a great primitives layer. The slop isn't shadcn. It's what happens when an AI tool, given no other information, falls back to the default of its training corpus, and the dominant default in modern web UI training data is the way shadcn looks unstyled, on top of Tailwind, inside a B2B SaaS layout.

Once you've seen it you can't unsee it. And you've seen it. Every new AI startup landing page. Every internal tool. Every Y Combinator demo since 2024. It's the default settings page of the internet.

It's not a bug. It's the mean.

Here's the part most people get wrong. This isn't a quality problem with the model. It's a prediction problem.

Every frontier LLM, when asked to generate a UI, is doing what it does for prose. It samples the most probable continuation given the input. With no input that disambiguates whose product this is, the most probable visual continuation is the centre of the corpus. Whatever pattern shows up most often in the training data wins.

Researchers have a name for this. They call it typicality bias: preference data systematically rewards familiar outputs, and post-training alignment then pulls the model toward whatever is most-typical. When the corpus is the internet's accumulated frontend code from 2018 to today, the most-typical modern web UI is shadcn-on-Tailwind in a SaaS layout. So that's what you get.

Three forces compound this:

Training-data gravity. The largest sources of frontend code in any model's training set are GitHub and the major component libraries. shadcn, Material, Bootstrap, Tailwind UI, and the constellation of B2B SaaS dashboards built on top of them dominate the visual signal. The least-surprising output is the one that looks like all of those at once. There is no countersignal unless you provide it.

Adjective collapse. When you prompt with modern, clean, minimal, or premium, those words land in roughly the same neighbourhood of the model's latent space. Each of them, in the corpus, was paired with the same kind of UI. So they all generate the same UI. Modern and premium and minimal are, for most production-trained models, near-synonyms for "looks like Linear without actually being Linear."

The detail gap. Even when a brand is loud and specific (a colour palette, a logo, a couple of headline words), it usually only covers about 5% of the decisions a model makes when rendering a screen. Hover states, focus rings, eight-step neutral ramps, the exact radius difference between a chip and a card, the spacing rhythm between sibling sections, the right tracking on display type at 32px. None of that is in the brand guide. The model fills it in from the corpus mean, which is the slop, and the slop wins by sheer surface area.

Generic isn't the model failing. Generic is the model succeeding at exactly the task you implicitly gave it: produce the most likely thing.

Vercel and Anthropic have both said so

This isn't a fringe view. The two companies most invested in AI UI generation have already published it on record.

Vercel's engineering team writes that to get on-brand output from v0, you have to map your tokens into the project's globals.css and steer the model through a registry. Their framing: structure comes from shadcn, appearance comes from your tokens. Without your tokens, you get their defaults, which are everyone's defaults.

Anthropic published a frontend-design Skill for Claude that explicitly forbids the generic look. Inter and Roboto are out. Purple gradients on white are out. Predictable card layouts are out. The Skill instructs Claude to pick an extreme aesthetic posture (brutalist, editorial, maximalist, neo-skeuomorphic) before writing a single line of code, because the model's natural gravity is toward the centre and the only way to get distinctive output is to push it off-centre on purpose.

That is a remarkable thing for the company building the model to admit. The default mode of the model, by their own assessment, is generic. The fix, by their own assessment, is to load explicit constraints into context.

That is the entire premise of a DESIGN.md. It's the entire premise of a Taste Profile. The companies who build these tools are quietly publishing the answer in their own docs.

Why prompting doesn't escape it

A reasonable response is fine, I'll just prompt better. You won't.

A prompt is a description. The model interprets descriptions and lands somewhere in latent space. The "somewhere" is, again, controlled by where similar descriptions in the training corpus tend to land. If you prompt for warmer, more editorial, less corporate, the model produces a shifted version of the mean, which is still near the mean. The radius is rounded to 10 instead of 8. The accent is amber-ish instead of indigo. Inter goes up two pixels. The shape of the page is the same.

There's a second reason. Prompts decay across turns and across sessions. Tokens are cheap to write and expensive to enforce. By the third reply, the model has paraphrased your brief into its own words and is generating against the paraphrase, which has drifted toward the mean again. By the next session, the prompt is gone entirely.

Prompts are interpretation. Specs are contract. Models reach for the mean unless something in their context is doing measurable work to pull them off it. A prompt is light. A spec is heavy.

What actually moves the mean

There are five things that, in practice, change the centre of gravity of an AI design tool's output. None of them is prompt engineering.

1. Tokens, all the way down. Not just accent: #FF6B35. Every state of every interactive element. Hover, press, focus, disabled, the exact white that sits on top of the accent, the focus ring colour and its opacity, the neutral ramp at eight steps, the spacing scale referenced from every component. Until those exist, the model fills them in. When they exist, the model uses them and the slop disappears from the seams. (For the format the modern tools agree on, see W3C DTCG design tokens: a practical guide.)

2. Nevers. Constraints are force multipliers. No purple-to-pink gradients. No Inter. No glassmorphism. Headlines are never italic. A list of five to seven nevers eliminates a huge swath of the corpus mean before the model even starts sampling. This is exactly what Anthropic's frontend-design Skill does. It's free and it works.

3. Principles the model can quote back. "Warmth over neutrality." "Density over decoration." "Earn every animation." Short, punchy, repeatable. Models love quotable rules because they show up cleanly in the chain of reasoning when the model is deciding what to render. Vague principles ("we are user-focused") do nothing. Specific principles ("use colour to create hierarchy, not decoration") do real work.

4. Anchored references. "Linear: the quiet confidence of their spacing." "Stripe: information density that never feels crowded." A reference paired with a one-sentence reason is worth more than a Pinterest board, because the reason is the part the model uses. Without the reason, "Linear" collapses back into the mean. With the reason, the model has a vector pointing somewhere specific.

5. A persistent file the tool reads automatically. A DESIGN.md and a SKILL.md in your repo, or installed in your Claude Team workspace, mean the brief is loaded into every session by default. No re-pasting. No drift. Every team member, every chat, every retry of the same prompt, gets the same context. This is the part that compounds. Without it, the other four reset every Monday.

Together these five things are countersignal. Each one moves the centre of gravity off the corpus mean and toward your brand. The more of them you have, and the more specific each one is, the further from generic the output lands.

You can't beat the mean. You can move it.

The mistake is thinking the goal is to defeat the model's generic gravity. You can't. Every model, today and for the foreseeable future, will reach for the most-likely output. That's how prediction works.

The goal is to change what's most likely. Once your tokens, nevers, principles, and references are loaded into the model's context, the most-likely output isn't the corpus mean any more. It's your brand. The model is still doing maximum likelihood. You've just changed the distribution.

That's why a one-time DESIGN.md (or, if you'd rather not write one, a Taste Profile) compounds. Every prompt benefits. Every team member benefits. Every tool benefits, because the format is portable across Claude Design, Cursor, v0, Lovable, ChatGPT, and anything else that reads Markdown context.

You stop fighting your tools. They start producing your brand from the first prompt instead of the fifth revision.

Part of our series on briefing AI tools to ship on-brand work. See also: DESIGN.md: the AI-era brand guide, Why not just paste your brand guide into Claude?, and Anatomy of a great DESIGN.md.