Accessibility audit workflow
Each tool catches a different layer of the same problem. Use them together, not in isolation — Figma flags missing variants before any code is written, axe flags real DOM violations after build, the spec audit catches APIs that don't enforce a11y by default.
Three sources, three layers
Any single tool will miss things. Triangulating across all three keeps the gaps small enough to ship.
- label missing on render
- heading-order in real DOM
- scrollable-region focus
- missing variants
- target-size < 24×24
- color-blind simulation
- API doesn't enforce
- a11y prop missing
- discriminated-union gaps
All three: accessible name. axe ∩ Figma: contrast on real surfaces.
| Source | Catches | Misses | When to run |
|---|---|---|---|
| axe-core (browser) | Real DOM violations as the component renders — labels, contrast, heading order, ARIA mistakes, keyboard traps | Issues that need a specific prop combination you didn't story; design-only problems (missing variants); API-shape gaps | After every build · on every Storybook story · on the deployed site |
| Figma a11y audit | Variant coverage, focus indicator quality, non-color differentiation (WCAG 1.4.1), target size (WCAG 2.5.8), color-blind simulation | Anything runtime — keyboard behavior, ARIA wiring, real-content edge cases | Before handoff · per component-set · before the design ↔ code parity check |
| Spec / impl audit | APIs that let consumers ship unlabeled components by accident; specs without required a11y props | Anything that depends on rendered output | Code review · before stabilising a component API |
The audit loop
Five steps. Each step feeds the next; the loop closes after a verification re-run. Skip Triangulate at your peril — it's the step where you decide whether what one tool flagged is a real issue or noise.
Worked example — LlmProgress
A component all three lanes touched. Figma said it was perfect; axe said it was broken; the spec told us why and where to fix.
overall score · ✓ variant coverage · ✓ color blind · ✓ annotations
"perfect"
violations · serious · aria-progressbar-name on three progress bars in /patterns
"missing label"
API gap · LlmProgressSpec has no label prop · consumers can't pass one
"the why"
all green · label?: string · spec + 3 impls · 36 tests pass
1 prop, 1 day
How the LLM coordinates the loop
Three MCPs cover the three audit lanes. Claude Code orchestrates — runs the audits, parses the results, makes the fix, re-runs for verification, all in one chat. No manual tool-switching.
The three calls
What each lane looks like in practice — copy these into a Claude Code session to start your own audit loop.
Setup — launch Claude Code with Chrome attached
Lane 1 needs Claude to drive a real browser. The
--chrome flag attaches a controlled Chrome
instance to the chat so the agent can navigate, inject
scripts, and read the DOM. One launch covers all subsequent
axe runs in the session.
# Launch Claude Code with the Chrome MCP attached.# This connects the running chat to a controlled Chrome instance so# the agent can navigate, run JS in the page, and read the DOM.
claude --chrome
# Once attached, the agent can call:# mcp__claude-in-chrome__navigate# mcp__claude-in-chrome__javascript_tool# mcp__claude-in-chrome__read_console_messages# … which is what powers the axe injection in Lane 1 above.Lane 1 — axe-core in a Storybook iframe
// Inject axe-core into the Storybook story iframe and run it,// scoped to the story root so the Storybook chrome doesn't pollute results.const iframe = document.querySelector('#storybook-preview-iframe');const doc = iframe.contentDocument;
await new Promise((res, rej) => { if (iframe.contentWindow.axe) return res(); const s = doc.createElement('script'); s.src = 'https://cdn.jsdelivr.net/npm/axe-core@4.10.0/axe.min.js'; s.onload = res; doc.head.appendChild(s);});
const result = await iframe.contentWindow.axe.run( doc.querySelector('#storybook-root'), { resultTypes: ['violations'] });console.log(result.violations.map(v => v.id));// → ["aria-progressbar-name", "button-name", ...]Lane 2 — figma-console MCP
// figma-console MCP — one audit per component-set.// Returns scorecard: variant coverage, focus indicator,// non-color differentiation (WCAG 1.4.1), target size,// annotations, color-blind simulation.figma_audit_component_accessibility({ nodeId: "129:20" })
// → {// overallScore: 82,// scores: { variantCoverage: 86, colorDifferentiation: 0, ... },// recommendations: [// { priority: "high", area: "color",// message: "Add non-color indicators to 2 state variants (WCAG 1.4.1)" },// { priority: "medium", area: "states",// message: "Consider adding a 'disabled' variant" }// ]// }Lane 3 — spec / impl audit
// Static spec-side audit — read libs/spec/src/index.ts plus one// implementation per component. Look for components where the// public API doesn't enforce a11y by default.//// Bad: optional aria-label on an icon-only-capable button.// Good: discriminated union — children OR aria-label required.
export type LlmButtonProps = Omit<ButtonHTMLAttributes<HTMLButtonElement>, 'children' | 'aria-label'> & LlmButtonSpec & ( | { children: ReactNode; 'aria-label'?: string } | { children?: undefined; 'aria-label': string } );From a real audit cycle (2026-04-26)
The numbers from the run that produced this page — start to ship,
five phases, end-to-end. Baseline in
tasks/a11y-audit-2026-04-26.md; after-snapshot in
tasks/a11y-audit-2026-04-26-after.md.
The eight P-critical findings split cleanly across the three
lanes: four were Figma-only (focus indicator contrast, target
size on Checkbox / Radio / RadioGroup, missing variants on
Select and Combobox, color-only differentiation on the danger
Button), three were axe-only
(aria-progressbar-name, button-name,
scrollable-region-focusable), and one was
spec-driven (icon-only Buttons could ship unlabelled). No two
lanes flagged the same node — but together they painted a
complete picture.
End state: every directly-touched docs page reports zero axe
violations (/workshop,
/tutorial, /first-component,
/patterns, /a11y-workflow,
/accessibility, /install,
/mcp) and the Figma file's worst component-set
score is 92. Three findings are deferred to a designer
(state-axis convention, protanopia rebinding, the
LlmTabGroup focus reading) — none block release.
When to run which
| Phase | Source | Why now |
|---|---|---|
| Component design (Figma) | Figma a11y audit | Catch missing variants and tap-target issues before any code is written. Cheapest fix point. |
| Spec change (add/edit a11y prop) | Spec audit + types compile | Make sure the API enforces a11y by default, not by consumer discipline. Catch missing discriminated unions. |
| Story added (Storybook) | axe on the story root | Verify the rendered DOM is correct in this exact prop combination. Stories are your test surface. |
| PR check | axe on changed components only | Fast regression gate. The Figma + spec lanes are slower; reserve them for component-level changes. |
| Release | All three, full sweep | One snapshot per release captures the baseline so the next cycle can diff against it. |