This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
markdown-for-agents is a runtime-agnostic HTML-to-Markdown converter built for AI agents. Single runtime dependency: htmlparser2. Monorepo with pnpm workspaces.
pnpm build # Build all packages (tsdown)
pnpm test # Run all unit tests (vitest)
pnpm test:watch # Watch mode
pnpm test:integration # Build + run integration tests (Node/Bun/Deno)
pnpm lint # ESLint (includes Prettier checks)
pnpm lint:fix # ESLint with auto-fix
pnpm format # Prettier --write
pnpm typecheck # TypeScript type checkingRun for a specific package:
pnpm --filter markdown-for-agents test
pnpm --filter @markdown-for-agents/express testRun a single test file:
pnpm vitest run packages/core/test/unit/rules/block.test.tsThe core library processes HTML through a 6-stage pipeline (see docs/architecture.md):
- Parser (
core/parser.ts) —htmlparser2.parseDocument()produces a DOM tree - Extractor (
extract/) — Optional pruning of non-content elements (nav, footer, ads) - Walker (
core/walker.ts) — Depth-first traversal, applies matching rules to each element - Renderer (
core/renderer.ts) — Normalizes whitespace, collapses blank lines - Deduplicator (
core/dedup.ts) — Optional removal of duplicate content blocks - Token Estimator (
tokens/) — Counts tokens, characters, words
Rules live in packages/core/src/rules/ (block, inline, list, table). Each rule has:
filter: tag name, tag name array, or predicate functionreplacement(context: RuleContext): returnsstring(markdown output),null(strip element), orundefined(skip, try next rule)priority: higher runs first; built-in rules use 0
packages/core/— Main library (markdown-for-agents). Three export subpaths:.,./extract,./tokenspackages/audit/— CLI tool for token savings analysispackages/middleware/{express,fastify,hono,nextjs,web}/— Framework middleware adapters. All checkAccept: text/markdownheader and follow the same pattern
- ESM only — all imports use
.jsextension (TypeScript ESM convention) - No
as any— use type guards (isTag(),isText()fromdomhandler) - Null means remove — rule
replacementreturnsnullto strip an element - Undefined means fall-through — returns
undefinedto try the next rule - Prettier: 140 char width, 4-space indent, single quotes, no trailing commas,
arrowParens: "avoid"
- Test through the public
convert()API when possible - Use
toContainfor flexible output assertions (avoids whitespace brittleness) - Use
toBeonly for exact formatting tests - HTML fixtures live in
packages/core/test/fixtures/ - Integration tests verify the built
dist/output across Node, Bun, and Deno - Middleware tests use real server instances
Always run lint, typecheck, and tests before considering a change complete:
pnpm lint:fix && pnpm typecheck && pnpm test