Anatomy of a great DESIGN.md

The Google Labs DESIGN.md spec tells you the section names and the token syntax. It does not tell you what good writing inside those sections looks like. That is the part that matters.

A spec-conformant DESIGN.md can still be a generic one. The structure validates, the YAML parses, the markdown is in canonical order, and an AI tool reading it still produces shadcn slop because the prose underneath is hedged, abstract, or copy-pasted from a brand guide written for humans.

This post is a tour through a real DESIGN.md (the one this site is built on) section by section. For each one we call out what earns its place, what a weaker version usually looks like, and why the difference shows up in the output. At the end we run the official linter over the file, because "is this a good DESIGN.md" is a question with two halves, and the structural half is now a one-line check.

The example brand is the Taste Profile site itself: an Editorial Strategist, blue accent, serif headlines in light weight. Real values, real reasoning, no stand-ins.

The frontmatter: tokens, named for humans first

---
version: alpha
name: Taste Profile
description: Editorial but approachable. Confident authority with room to breathe.
 
colors:
  base-white: "#FFFFFF"
  primary-50: "#EFF6FF"
  primary-600: "#2563EB"
  primary-700: "#1D4ED8"
  neutral-50: "#F8FAFC"
  neutral-200: "#E2E8F0"
  neutral-600: "#475569"
  neutral-900: "#0F172A"
 
  primary: "{colors.primary-600}"
  background: "{colors.base-white}"
  foreground: "{colors.neutral-900}"
  foreground-secondary: "{colors.neutral-600}"
  accent: "{colors.primary-600}"
  accent-hover: "{colors.primary-700}"
 
components:
  button-primary:
    backgroundColor: "{colors.accent}"
    textColor: "{colors.base-white}"
    typography: "{typography.button}"
    rounded: "{rounded.full}"
    padding: 14px 24px
  button-primary-hover:
    backgroundColor: "{colors.accent-hover}"
    textColor: "{colors.base-white}"
    typography: "{typography.button}"
    rounded: "{rounded.full}"
    padding: 14px 24px
---

What earns its place:

A two-tier palette: primitives named by family (primary-600, neutral-200) plus semantic aliases named by role (accent, foreground-secondary). The model picks colors by intent, not by climbing a Tailwind ramp. Per the spec, primary is the canonical top-level palette name. Even if your code calls it accent, the YAML carries primary so downstream tools (Figma variables, tokens.json exports) get a coherent root.
Token references that compose: {colors.primary-600}, {rounded.full}, {typography.button}. One change to a primitive cascades. The file teaches the model what depends on what.
Hover as a separate variant, not a sub-token. The spec only recognises backgroundColor, textColor, typography, rounded, padding, size, height, width. State changes live as their own components (button-primary-hover). It looks more verbose; it lints clean and stays decomposable.
One accent. Not three. Scarcity is a design decision, encoded.

Common weak version: a flat dump of every Tailwind shade, no roles, no references, hover folded into the parent component as a backgroundColorHover field the linter doesn't recognise. You hand the model sixty colors and no opinion about which to use, and it picks the friendliest looking ones.

Overview: a sentence that tells the model what NOT to be

## Overview
 
Tension: Authoritative BUT warm.
 
The brand identity is The Editorial Strategist, a mid-30s strategist who
reads design blogs and HBR, wears navy and white, and speaks clearly. At
a party, they're having a real conversation in the corner, not working
the room. Sharp but warm.

What earns its place:

A named tension (Authoritative BUT warm). Two-word framings the model can hold in working memory. The "BUT" is load-bearing: it tells the model the brand resolves a contradiction, not averages two adjectives.
An identity name (The Editorial Strategist) that compresses the personality into one phrase you can quote back when output drifts.
A concrete contrast: what we are fighting against (generic startup, cold corporate). Negative space matters as much as positive.
A sensory anchor (well-designed magazine article). One reference image worth a hundred adjectives.

Common weak version: "modern, clean, professional." Three words that describe every B2B product on the internet. The model has nothing to disambiguate against, so it averages.

Personality: a person, not a vibe

A 35-year-old strategist with a strong point of view, communicated quietly.
Specific over general, warm over cold, calm over loud. Says "Here's what
I'd do" instead of "It depends." Wears navy and white. Reads HBR and design
blogs. Remembers your name.

What earns its place:

Specifics: navy and white, mid-30s, reads HBR. The model can render those.
A behavior, not just a description. "Says 'Here's what I'd do' instead of 'It depends'" tells the model how the brand acts under social pressure, which translates into how a button feels under hover.

Common weak version: "approachable yet authoritative." A contradiction the model resolves by averaging, which is how you end up with friendly purple dashboards that are also, somehow, corporate.

Principles: rules with reasons

1. The Breathing Room Rule. Generous whitespace is the default.
   Why: cramped layouts signal anxiety; space signals confidence.
 
2. The Blue-Means-Action Rule. accent is reserved for interactive
   elements only.
   Why: when everything is accented, nothing is. Scarcity creates intent.
 
3. The Clarity-Is-Kindness Rule. Clear hierarchy, no clutter, one idea
   per section.
   Why: cognitive ease is a form of respect; confusion is a form of
   arrogance.

The pattern is named rule, then why.

What earns its place:

The name. "The Breathing Room Rule" is short enough to cite when you reject output. You can literally write back to the model: that violates the Breathing Room Rule. Naming things makes them enforceable.
The why. The rule survives the model re-reading it because the reason is encoded inline. Without the why, the rule is just a preference, and preferences lose to training data.

Common weak version: a list of generic best practices ("be consistent", "use whitespace"). Any modern tool already does these. They add nothing.

Don'ts: the highest-leverage section in the file

## Do's and Don'ts
 
### Never
 
- Forbid warm colors. The palette is deliberately cool; warmth would
  undermine the precision signal.
- Forbid italic headlines. Playfulness conflicts with editorial authority.
- Forbid blue as decoration. When everything is accent, nothing is.
- Forbid Source Serif weight 700+ for marketing copy. A 700-weight serif
  reads as a law firm; a 400-weight serif reads as a magazine.
- Forbid grey shadows on the primary button. The blue glow is a signature
  element. Replacing it with a neutral drop reads as a generic SaaS button.
- Forbid generic startup illustrations. Blob people and rainbow gradients
  signal "we gave up on taste."

Don'ts are the single highest-leverage section in a DESIGN.md. A well-written don't ships more on-brand output than three new principles.

Why: AI tools default to the statistical mean of their training data. They have seen a million warm gradients, a million blob illustrations, a million rounded purple cards. Telling the model what to be is competing with that mean. Telling it what not to be removes whole regions of the output space in one line.

What earns its place:

Specific failure modes, not categories. "Blob people and rainbow gradients" beats "avoid generic illustrations" because the model has to imagine the bad version before it can avoid it.
A because on every line. The rule that can be quoted back when the model breaks it.
Implementation-specific don'ts, not just aesthetic ones. "Forbid grey shadows on the primary button" is the kind of thing that wins the third generation, when the model has stopped reaching for warm colors but is still reaching for neutral drop shadows out of habit.

Common weak version: "don't use too many fonts." Too vague to enforce, and the model already does not, so the line is decoration.

Colors: scarcity, with reasoning

## Colors
 
The palette is rooted in Editorial Blue as the single accent and a Slate
Neutral ramp for surfaces and type. There are no warm hues anywhere; the
cool palette is the precision signal.
 
- Primary, Editorial Blue (primary-600, #2563EB). The only accent.
  Reserved for interactive elements: primary buttons, links, focus rings.
  Never used as a decorative section background.
- Neutral, Slate. neutral-50 is the subtle section fill, neutral-200 is
  borders, neutral-600 is body text, neutral-900 is both headlines and
  the dark-section background. Same hex, two roles. The system collapses
  depth into two tonal worlds.
- Tertiary, Red. Reserved for problem indicators only. Not a status
  colour beyond that.
 
Authoring rule: reach for semantic names, not raw shades. UI code uses
bg-accent and text-foreground-muted, not bg-primary-600.

What earns its place:

The palette named by role first (primary, neutral, tertiary), values second. The role is what travels across screens; the hex is what changes when you re-skin.
The accent rule written into the section, so the model encounters the constraint at the moment it is picking a color, not three sections away.
A role for darkness. Dark navy is not a style choice, it is a narrative tool, reserved for problem statements and final CTAs (the Contrast-for-Emphasis Rule).
A ramp justification for collapsed roles. neutral-900 doing duty as both foreground and background-dark is unusual; the prose calls it out instead of leaving the model to discover it by guessing.

Common weak version: a twelve-color palette with no hierarchy. The model uses all of them, on the same page, equally weighted.

Typography: identity carried by one decision

## Typography
 
Display - Source Serif 4. Headlines, page titles, brand moments. Weight
400-600 (never 700+ for marketing copy - see the Editorial Authority
Rule). Tight tracking (-0.02em) on all sizes >= 24px.
 
Body - Inter. Body copy, UI, eyebrows, buttons. Regular weight is default.
 
A 700-weight serif reads as a law firm. A 400-weight serif reads as a
magazine. Light weight matters more than the family choice.

What earns its place:

Light weight specified. Most AI defaults reach for 600 or 700. Without the instruction, you get bold sans-serif headlines no matter what font you named.
A small, specific number (-0.02em tracking). Small specifics compound. A model that follows one specific instruction tends to follow the next one.
Reasoning attached to the weight choice. The "law firm vs magazine" comparison gives the model a fork to disambiguate against, and ties the rule back to the named principle (the Editorial Authority Rule), so a future agent reading just the principles still picks the right weight.

Common weak version: "Inter for everything." Gives the model no way to distinguish display from body, so display becomes body in 32px, and the page reads flat.

Components: recipes the spec recognises

## Components
 
Sub-token names follow the spec's recognized set (backgroundColor,
textColor, typography, rounded, padding, size, height, width); hover and
focus states live as separate component keys.
 
Primary Button (button-primary). accent fill, white text, blue glow
shadow. Hover transitions to button-primary-hover (accent-hover fill,
deeper glow). Reserved for the single most important action per screen.
 
Card. White background, 1.5rem radius, 32px internal padding. Lifts with
shadow.card-hover on hover. No accent fill in the default state - the
accent colour only appears on CTAs inside the card.

What earns its place:

Recipes: shape, color, and typography in token references the model can resolve.
Hover states named explicitly as their own component. The spec's recognised sub-tokens don't include backgroundColorHover. Write it as a separate variant or the linter will flag it.
Shadows in prose, not the component map. Elevation is a system-level concept; the recognised sub-tokens don't include shadow. Document the glow in the Elevation section, reference it by name (shadow.button) in component prose.
Negative instruction inside the component: "no accent color in the default state." The constraint travels with the component, so the model encounters it where it matters.

The linter: structural quality, automated

The whole structural half of "is this a good DESIGN.md" is a one-line check now:

npx @google/design.md lint DESIGN.md

The output is JSON, with every finding categorised as error, warning, or info. The kinds of things it catches:

No YAML frontmatter at all. A DESIGN.md that's pure prose still validates as markdown but fails the spec's source-of-truth contract.
Unrecognised sub-tokens. backgroundColorHover, border, chipTextColor: common patterns that don't survive contact with the spec's recognised set. The linter names each one, with its path.
Missing canonical palette. No top-level primary color → "the agent will auto-generate key colors, reducing your control over the palette."
WCAG contrast violations on component text/background pairs, with the actual ratio. Catches subtle errors like danger text on danger-subtle failing 4.5:1.
Orphan tokens. Primitives defined but never referenced: a signal you're dragging dead weight.

Three of the four marketing-design DESIGN.md files I had on my laptop failed this lint with double-digit warning counts before I cleaned them up. None of them looked broken to a human. They were all bleeding the model's output through unrecognised fields and undefined palette roles.

The linter doesn't tell you whether your prose is good. But it does tell you whether the file you're handing to an AI agent is the file the agent will actually parse. That's a question you used to discover the answer to by watching generations come out wrong.

What you walk away with

When the file above is loaded into Claude Design, the difference shows up the first time you prompt "build me a settings page": pill buttons in blue gradient, serif headlines in light weight, accent reserved for save and link, dark navy for the destructive zone. Without it, the same prompt produces shadcn defaults in a calmer palette.

That is the whole job of DESIGN.md: make every generation start from your taste, not the model's training mean.

A four-test checklist

Before you consider a section done, it should pass four tests:

Specific. Could this paragraph describe any other brand? If yes, rewrite it.
Reasoned. Does each value have a why a human or a model can quote back? If not, add one.
Falsifiable. Could you reject AI output by pointing at a sentence in this file? If not, the sentence is decoration.
Lints clean. Run npx @google/design.md lint DESIGN.md. Zero warnings on canonical structure (recognised sub-tokens, top-level primary, contrast on stated component pairs). Orphan-token warnings are negotiable; everything else is not.

A DESIGN.md that passes all four on every section is a working tool. Anything less is a brand deck in disguise.

Part of our series on design systems for the AI era. See also: DESIGN.md is now an open standard and DESIGN.md: the AI-era brand guide.

The frontmatter: tokens, named for humans first

Overview: a sentence that tells the model what NOT to be

Personality: a person, not a vibe

Principles: rules with reasons

Don'ts: the highest-leverage section in the file

Colors: scarcity, with reasoning

Typography: identity carried by one decision

Components: recipes the spec recognises

The linter: structural quality, automated

What you walk away with

A four-test checklist

Ready for a Taste Profile of your own?