feat(capture): extract chips/stat-cells/tabs, detect icon fonts, transparent grounds#1827
Merged
Merged
Conversation
…sparent grounds
designStyleExtractor now also extracts chip/pill/badge/tag, stat/metric cells, and
tab components — by class-substring selector plus a shape fallback (small + fully
rounded + short text) so hashed/utility class names (Tailwind, CSS-modules) are
still caught. It also emits a "transparent" sentinel for fully-transparent
(rgba(...,0)) grounds instead of collapsing them to #000000, so a transparent
chip/tab/stat on a light-ground site no longer reads as solid black.
fontMetadataExtractor now flags icon fonts (isIcon) by glyph coverage: a font is an
icon font only when it BOTH lacks a real Latin alphabet (<26 of A-Za-z) AND is
mostly (>50%) Private-Use-Area glyphs. The Latin gate matters — some text fonts pack
thousands of PUA glyphs yet are plainly text (Apple SF Pro is ~81% PUA but ships a
full alphabet; Descript's Booton ~50%); flagging by PUA ratio alone would strip a
brand's real typeface. Measured icon fonts: "hushly" 63% PUA / 7 letters, Font
Awesome 95% / 0 letters. Names alone can't identify icon fonts ("hushly",
"swiper-icons"), hence the glyph-based test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
f4f61b5 to
6cc8731
Compare
miga-heygen
reviewed
Jul 1, 2026
miga-heygen
left a comment
Contributor
There was a problem hiding this comment.
Review: feat(capture): extract chips/stat-cells/tabs, detect icon fonts, transparent grounds
Summary: Extends the capture engine's design-style extraction to three new component families (chips, stat cells, tabs), adds icon-font detection via PUA glyph ratio, and fixes transparent backgrounds being collapsed to #000000. Well-structured, well-tested, validated across diverse sites.
Findings:
| # | Location | Severity | Note |
|---|---|---|---|
| 1 | fontMetadataExtractor.ts:206 |
concern | font as unknown as { characterSet?: number[] } is a double-cast. Project CLAUDE.md says "Avoid any and as T assertions." The try/catch + Array.isArray guard makes it runtime-safe, but consider a type guard function instead: function hasCharacterSet(f: Font): f is Font & { characterSet: number[] }. |
| 2 | designStyleExtractor.ts:209 |
suggestion | parseFloat(st.borderRadius) only reads the first value of shorthand like "24px 24px 0px 0px". An element with only top-rounded corners would pass the pill-shape check. Unlikely to cause false positives given the other constraints (height ≤ 44, width ≤ 260, short text, has skin). |
| 3 | designStyleExtractor.ts:262 |
suggestion | [class*="tab"] could match tabpanel, tabindex, tabular, establish, stable. Downstream size filters mitigate most false positives, but consider additional :not() exclusions if observed in practice. |
| 4 | fontMetadataExtractor.ts:192 |
nit | isIcon not propagated to FontFamilySummary. Consumers need to iterate files to discover if a family is an icon font. Worth noting for future consumers. |
| 5 | designStyleExtractor.ts:216-226 |
nit | Same DOM element can appear in both chipEls and shapeChips, getting getStyles() called twice before dedup by key. No correctness issue, just redundant work. |
| 6 | rgbToHex — transparent fix |
nit | No unit test for the transparent sentinel. The function lives inside a page.evaluate script so unit testing requires extraction or e2e. Not a blocker. |
What looks good:
- Icon-font detection heuristic is clever — dual-gate (Latin alphabet + PUA ratio) handles the tricky SF Pro / Booton false-positive case that naive PUA-only would miss
- Transparent sentinel is a clean fix for a data-loss bug (transparent →
#000000) - Type additions (
chips?,statCells?,tabs?) are backward-compatible - Test coverage is solid — SF Pro test case is particularly valuable (validates the Latin gate)
- Shape-fallback chip detection fills the gap for sites that don't use class names with "chip/tag/badge/pill"
- Stat cell extraction correctly finds the biggest-font child for the "number" style
Verdict: LGTM — Well-designed extraction with real-world validation. The main actionable suggestion is replacing the as unknown as cast with a type guard.
— Miga
miguel-heygen
approved these changes
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Extends the capture engine's design-style + font extraction so a wider range of real-world sites produce faithful, usable component/typography tokens.
designStyleExtractor.ts"transparent"sentinel for fully-transparent (rgba(...,0)) grounds instead of collapsing them to#000000, so a transparent chip/tab/stat on a light-ground site no longer reads as solid black.fontMetadataExtractor.tsisIcon) by Private-Use-Area glyph ratio (>50%). Icon fonts ship arbitrary names —swiper-icons, a customhushly, Font Awesome — that no name-list can enumerate; without this they get mistaken for a text family and render headings as tofu/icons. A plain "no Latin letters" test is deliberately avoided: a text font served as a unicode-range subset legitimately lacksAyet is 0% PUA.types.ts—DesignStylesgains optionalchips/statCells/tabs; newStatCellStyle;FontFileMetadatagainsisIcon.Why
Validated end-to-end across 7 diverse sites (Stripe, LiveKit, DoorDash, Snowflake, Linear, ElevenLabs, Kuse). Each surfaced a distinct real-world case this PR handles generally (not per-site): oklch/hsl colors, camelCase/hashed font names, icon fonts, transparent grounds, unicode-range subsets. Snowflake's
hushlyicon font was rendering headings as icon glyphs before theisIconfix.Tests
isIconCharacterSet(PUA-heavy → icon; Latin / cyrillic-subset / empty → not).fontMetadataExtractor.test.ts: 39 passing.bun run build, oxlint, oxfmt, typecheck all clean on changed files.🤖 Generated with Claude Code