Skip to content

ci: gate docs PRs on internal-link validity#179

Merged
PatrickRitchie merged 40 commits into
TrakHound:masterfrom
ottobolyos:chore/check-broken-links
Jun 1, 2026
Merged

ci: gate docs PRs on internal-link validity#179
PatrickRitchie merged 40 commits into
TrakHound:masterfrom
ottobolyos:chore/check-broken-links

Conversation

@ottobolyos
Copy link
Copy Markdown
Contributor

@ottobolyos ottobolyos commented May 31, 2026

Summary

Adds an internal-link validation gate to the docs CI, expands docfx coverage to include the agent + adapter projects, authors the three previously-missing /configure/ sub-pages (Run, Connect a consumer, Operate), and rewrites every broken internal link across the docs tree at the correct target. The link checker itself is hardened against the edge cases that surfaced during review, runs with bounded concurrency, and lands in a dedicated CI job so the wall-time cost is hidden behind the VitePress build.

Tooling

  • docs/scripts/check-broken-links.mjs: new script. Walks every .md under the given root, parses with unified + remark-parse + remark-gfm, walks the rendered HAST for link / image nodes via unist-util-visit, validates filesystem paths (with and without .md extension, with percent-decoding, with index-page fallback), and validates anchor fragments against rehype-slug-derived heading IDs plus a tightened raw-HTML id="…" sweep on the target file. Hardening: position-guards autolink-derived nodes; constrains candidate paths to the docs root; promise-caches in-flight anchor parses; tightens the raw-id pattern to skip data-id, aria-labelledby, and namespaced *:id= attributes; strips fenced code blocks; normalises CRLF line endings before fenced-code matching; splits URLs on # before stripping the query so path?v=1#anchor resolves cleanly; trims candidate-path emission for the bare-/ branch to the docs-root index page only; skips symlinks and dot-prefixed directories; records unreadable directories rather than aborting the walk; catches per-worker rejections so a transient FS hiccup degrades to one recorded broken link rather than an unhandled rejection. Surfaces failures as <file>:<line>:<col>: broken link <url> rows on a single stderr stream so editor problem-matchers (VS Code, vim :cfile, GHA annotations) recognise the format. Bounded concurrency keyed off os.cpus().length cuts the 2k-file pass materially. Exits 1 on any broken link.
  • docs/package.json: adds check-links npm script. Replaces the unmaintained @jsdevtools/rehype-url-inspector (last release 2021, deprecated url-regex dep with a ReDoS advisory) with a direct unist-util-visit walk.
  • .github/workflows/docs.yml: factors out a shared prepare-docs job that runs the docfx + reference regeneration once and uploads the generated tree as a workflow artefact. Both build and check-links depend on prepare-docs and download the artefact; net wall-time drops to the longer of the two parallel jobs, and the previously-duplicated setup is gone.

Docfx coverage

  • docs/.docfx/docfx.json + docs/scripts/generate-api-ref.sh: expand docfx coverage from the 13 MTConnect.NET-* libraries to also include the agent core (MTConnect.NET-Agent, MTConnect.NET-Applications-Agents), every shipped agent module (HTTP server, HTTP adapter, MQTT adapter, MQTT broker, MQTT relay, SHDR adapter), the adapter core (MTConnect.NET-Adapter, MTConnect.NET-Applications-Adapter), and the shipped adapter modules (MQTT, SHDR). The expansion adds new docfx-generated pages including every *ModuleConfiguration type the narrative docs cross-link to.

New configure pages

  • docs/configure/run.md: CLI verbs (run / debug / trace / install / install-start / remove / start / stop / reset / help), Docker / Windows-service / systemd-unit deployment shapes (Docker entry-point and volume paths verified against the shipped Dockerfile; Windows-service LocalSystem note; Docker TLS / auth gap flagged), configuration-file resolution order, and first-boot troubleshooting against the real modules.http-server | Info | Listening at <prefix>.. startup line.
  • docs/configure/consumer.md: HTTP REST endpoints (/probe, /current, /sample, /asset) with the device-scoped path form, polling vs streaming, content-type negotiation against the shipped formatter rows, the two parallel MQTT topic layouts (document-server envelope publishes keyed by device from MTConnect.MTConnectMqttDocumentServer, and entity-server per-data-item publishes under Devices/<uuid>/Observations/<id> from MTConnect.Clients.MTConnectMqttEntityServer), .NET + Python consumer examples that use the actual shipped event names (CurrentReceived, Start(), ConnectionError, InternalError, EventHandler<IObservation> signatures), and a note that CurrentReceived fires once on stream initialization.
  • docs/configure/operate.md: NLog loggers + file layout (modules.<module-id> / processors.<processor-id> keyed off the shipped NLog.config rules), the metrics emitter (gated by enableMetrics: only; interval and window are constructor-only), health-check patterns, soft-reload via monitorConfigurationFiles: true, hard restart via OS services, NLog hot-reload via autoReload="true", durable-buffer handling with the shipped observationBufferSize: 131072 default.

Doc rewrites

  • docs/**: fixes every broken internal link the checker reports against the post-docs: add VitePress documentation site #157 tree. Two categories: page-rename remaps (/configure/agent/configure/agent-config, /wire-formats/json-cppagent*/wire-formats/json-v2-cppagent*) and API-page slug remaps (the docfx output uses MTConnect.<Namespace>.<Type>.md rather than MTConnect.<Namespace>/<Type>.html). Where the original doc cited a type that does not exist in the codebase (MTConnect.Formatters.Xml.XmlFormatter, JsonCppAgentFormatter, the SHDR-namespace adapter family, the XmlAssetsDocument / XmlErrorDocument display strings, MTConnect.Delegates, DataItemFilter, DataItemSource, ComponentRelationshipType, the SysML *TemplateRenderer family), the link is rewritten to the type that actually exists. Two namespace-level fallbacks reduce the rot surface when emitters move. Cross-repo ../tests/ and ../.github/ references in testing.md are rewritten as github.com URLs. The .NET HTTP consumer example uses observation.GetValue("Result") against the base IObservation surface so the snippet compiles against the shipped API.

External (HTTP/HTTPS) URLs are deliberately not validated — third-party state is not the gate.

ottobolyos added 27 commits June 1, 2026 20:36
Rewrite API page links to match docfx's flat MTConnect.<Namespace>.<Type> layout, swap obsolete configure/agent and configure/adapter paths for the canonical *-config slugs, update wire-format references to the json-v2-cppagent file names, and point troubleshooting cross-links at the matching sibling pages and corrected anchor slugs. Links to types that no longer exist as docfx-generated pages are kept as inline code.
Add the agent core, agent application host, every shipped agent module
(HTTP server / HTTP adapter / MQTT adapter / MQTT broker / MQTT relay
/ SHDR adapter), the Python agent processor, the adapter core, the
adapter application host, and the shipped adapter modules (MQTT, SHDR)
to the docfx metadata input. The narrative docs cross-link to module
configuration classes (e.g. MqttRelayModuleConfiguration) and to the
agent core (e.g. InputValidationLevel, MTConnectAgentProcessors) that
previously had no docfx-generated landing page; with the expanded
coverage every referenced public type now resolves to a real API page.

Update generate-api-ref.sh to build each project before docfx walks
its DLL output, including the agent + adapter projects.
The /configure/index landing page advertised five sub-pages: Install,
Configure an agent, Configure an adapter, Run, Connect a consumer,
Operate. The last three did not yet exist; every link to them across
the docs tree dead-ended at a missing page. Author them with real
prose covering:

- Run: CLI verbs (run / debug / trace / install / start / stop /
  reset), configuration-file resolution, Docker / Windows-service /
  systemd-unit deployment shapes, and first-boot troubleshooting.

- Connect a consumer: HTTP REST endpoints (/probe, /current, /sample,
  /asset), polling vs streaming, content-type negotiation, JSON v2
  MQTT topic tree, and .NET + Python consumer examples.

- Operate: NLog loggers + file layout, the metrics emitter, health
  checks, soft-reload via monitorConfigurationFiles, hard restart
  via OS services, and durable-buffer handling.

Wire all three into the /configure/ sidebar in .vitepress/config.ts.
Reverse the inline-code workarounds the first pass left in place and
route every dangling reference at the proper docfx-generated page.
The expanded docfx coverage (see prior commit) exposes the agent +
adapter module configuration classes, and the new /configure/run,
/configure/consumer, /configure/operate pages give every narrative
cross-link a real target.

Specific corrections:

- MTConnect.Configurations.\*ModuleConfiguration: linked to the
  expanded docfx pages instead of bare code spans.
- MTConnect.Formatters.Xml.XmlFormatter -> XmlResponseDocumentFormatter;
  MTConnect.Formatters.Json.JsonFormatter -> JsonResponseDocumentFormatter;
  MTConnect.Formatters.JsonCppAgent.JsonCppAgentFormatter ->
  JsonHttpResponseDocumentFormatter; MTConnect.Formatters.JsonCppAgentMqtt
  .JsonCppAgentMqttFormatter -> JsonMqttResponseDocumentFormatter
  (the originally cited types did not exist; the real formatter
  classes follow the IResponseDocumentFormatter naming).
- MTConnect.Delegates -> the MTConnect namespace landing page (the
  delegates live under namespace MTConnect, not in a Delegates
  container).
- DataItemFilter -> Filter; DataItemSource -> Source (the original
  references were renamed during the SysML import pass; the on-disk
  classes are the unprefixed versions).
- ComponentRelationshipType (non-existent enum) -> ComponentRelationship
  class; the relationship type values live on the inherited Type
  property.
- /configure/run, /configure/consumer, /configure/operate: linked to
  the newly-authored pages.
- /api/ placeholder links across the modules pages: deep-linked to the
  specific type pages (HttpServerModuleConfiguration, MqttTopicStructure,
  IMTConnectMqttDocumentServerConfiguration, MTConnectHttpServer,
  MTConnectShdrHttpAgentServer, etc.).
- SysML renderer references (CSharpTemplateRenderer / XmlTemplateRenderer
  / JsonCppAgentTemplateRenderer): the cited classes do not exist; routed
  to the MTConnect.SysML namespace + the MTConnectModel and ModelHelper
  classes that do.
- cookbook/write-a-json-mqtt-consumer.md: rewrote the parsing example to
  use JsonMqttResponseDocumentFormatter (the real class) with its actual
  CreateStreamsResponseDocument / CreateAssetsResponseDocument API.

The internal-link checker exits 0 against the resulting tree.
Document the two MQTT layouts the agent publishes—document-server (whole-envelope per device, with Sample carrying an MTConnectStreams delta) and entity-server (per-data-item under Devices/<uuid>/Observations/<id>). Update the mosquitto and Python examples to subscribe to the entity-server tree so the parsing block actually receives the per-observation payloads it expects. Drop the unverified application/mtconnect+json Accept-header row and point operators at the http-server module's documentFormat key for JSON v2 selection.
…ad claim

AgentConfiguration exposes only enableMetrics; the tick interval and window length are constructor-only on MTConnectAgentMetrics. Replace the invented metrics: block with the real switch. NLog hot-reload is gated by autoReload on the <nlog> root, not by internalLoggingLevel; name the correct lever.
The HTTP server module emits 'Listening at <prefix>..', not 'MTConnectAgent : Started on port 5000'. Replace the invented line so operators tailing logs find a match, and update the first-boot troubleshooting bullet that keyed off the same string.
MTConnectHttpClient exposes CurrentReceived (not OnCurrentReceived), Start() (no async overload), ConnectionError (transport) and InternalError (parsing/dispatch); OnError does not exist. Rewrite the snippet against the shipped event names. The device-scoped current path is /<deviceName>/current, not /current?deviceName=<name>—the query parameter is neither in the MTConnect REST spec nor in the .NET server module.
Sweep up six rewrites the earlier pass missed: XmlFormatter / JsonFormatter / JsonCppAgentFormatter -> the shipped ResponseDocumentFormatter family; the XmlAssetsDocument / XmlErrorDocument display strings -> their real *ResponseDocument counterparts; the MTConnect.Formatters.* display segment -> MTConnect.Formatters.Xml.* (the link targets were already correct); the spurious .Shdr. infix on the ShdrAdapter family -> the real MTConnect.Adapters namespace; and retarget the agent-processor-python reference rows at the dedicated MTConnect.Agents.ProcessObservation and MTConnect.Processors pages the text actually names.
behaviour -> behavior, monopolising -> monopolizing, honours -> honors. Also replace the unresolvable parts/2.0/HttpProtocol.md path with the docs.mtconnect.org URL the citation actually points at.
Replace @jsdevtools/rehype-url-inspector (unmaintained since 2021, pulls deprecated url-regex with a ReDoS advisory) with a unist-util-visit walk over rendered link/image HAST nodes—same coverage, one less archived dep, no behaviour change for valid links.

Edge-case hardening surfaced by the review pass:

- guard node.position?.start against autolink and plugin-inserted nodes (the throw previously cascaded to an unhandled rejection and exit-2)

- decodeURIComponent the path before stat() so %20-encoded targets resolve

- strip ?query before splitting on # and special-case '' / '#' as the docs-root index / placeholder rather than reporting them broken

- tighten the raw-HTML id sweep to skip data-id / aria-labelledby and to strip fenced code blocks before matching

- contain candidate paths under docs-root so [x](/../../../etc/passwd) cannot stat arbitrary FS locations

- skip symlinks during the walk and bracket the dot-directory skip list so .git / .docfx do not surface scratch markdown

- emit '<file>:<line>:<col>: broken link <url>' rows on a single stderr stream so editor problem-matchers (VS Code, vim :cfile, GHA annotations) recognise the format, and surface the failing-script name + stack on the top-level catch

Two cross-repo testing.md links (../tests/ and ../.github/) that the old script silently followed outside docs/ are rewritten as github.com URLs—the containment guard rejects them by design.
The link check ran serially after the docfx chain in the build job, on the order of four minutes on the 2k-file tree. Split it into its own job so it runs in parallel with the VitePress build—net wall-time drops to the longer of the two, not their sum.
The Windows-service install block defaulted to LocalSystem without naming the security trade-off; add one sentence pointing operators at a dedicated low-privilege account for production. The Docker block exposed -p 5000:5000 without flagging the lack of TLS / auth; add one paragraph pointing at the module-level TLS + auth blocks.
Cross-check against the shipped Dockerfile: ENTRYPOINT is dotnet agent.dll with CMD debug, WORKDIR /app; there is no /config/agent.config.yaml mount point. Rewrite the docker run example with the real /app/ paths and rewrite the prose to match. Pin dotnet run to the explicit MTConnect.NET-Agent.csproj path so the snippet is unambiguous even when the directory gains a second csproj.
Operate did not link back to Install or Configure an adapter; an on-call operator landing on the page had no one-click jump to either. Consumer did not link to Configure an adapter; integrators tracing an SHDR chain back to the equipment had no link back either. Add both.
Replace the fully-qualified MTConnectAgentMetrics link with a link to the MTConnect.Agents.Metrics namespace landing page. If the emitter is ever folded into another class the type link rots; the namespace link is stable across that refactor. The MQTT document-server reference in consumer.md was already retargeted at the root MTConnect namespace page in the earlier topic-shape rewrite.
@ottobolyos ottobolyos force-pushed the chore/check-broken-links branch from 6c70e55 to a4d3bad Compare June 1, 2026 18:41
ottobolyos added a commit to ottobolyos/mtconnect.net that referenced this pull request Jun 1, 2026
ottobolyos added a commit to ottobolyos/mtconnect.net that referenced this pull request Jun 1, 2026
…xample

F-075 — operate.md:17-18: the file-pattern column on the modules /
processors rows still pointed at logs/<module-name>-<date>.log after
the F-052 fix updated only the logger-name column. The shipped
NLog.config templates the file as logs\${logger}-${shortdate}.log,
and ${logger} substitutes the full logger name including the
modules. / processors. prefix, so the on-disk file is actually
logs/modules.<module-id>-<date>.log. Update the file-pattern cells
accordingly.

F-076 — add a Common-operational-patterns bullet showing the
per-module and per-processor tail commands so operators do not have
to reconstruct the on-disk path from the logger table.
F-077 — the three configure/ pages (run, consumer, operate) used the
spaced em-dash ' — ' while the rest of the docs tree (notably the
newer docs-site.md page) uses the closed CMOS form 'word—word'.
CONVENTIONS §1.0d-decies makes the closed form canonical, so retrofit
the three pages to match. Verified no em-dashes inside fenced code
blocks or inline-code spans before the sweep.
…ample

F-078 — the .NET HTTP and MQTT consumer snippets called
observation.GetValue("Result") with a bare string literal where the
shipped library defines ValueKeys.Result = "Result" in
libraries/MTConnect.NET-Common/Observations/ValueKeys.cs:15. Switch
both call sites to observation.GetValue(ValueKeys.Result) and add
'using MTConnect.Observations;' to each snippet so the constant
resolves. Reads as idiomatic .NET and signposts the ValueKeys.*
typed-constant home (Result, Level, NativeCode, …) for readers who
need to access other value keys.
…avior

F-079 — actions/upload-artifact@v4 strips the longest common prefix
from a multi-path list, so uploading 'docs/api' + 'docs/reference'
produces an artifact rooted at 'api/' + 'reference/' rather than the
literal upload paths. The downstream download-artifact steps in the
build and check-links jobs rely on 'path: docs' to restore the
'docs/' prefix. The behaviour differs from v4's v3 predecessor and is
non-obvious on a quick read. Add an inline comment above the
upload-artifact step so a future editor of either end of the
upload / download pair does not break the path contract.
…ecker

F-080 — runWithConcurrency synthesises a recordBrokenLink entry on
worker failure with url set to '<worker failure: ${message}>'. The
literal '<' / '>' characters propagate to the problem-matcher line as
'<file>:0:0: broken link <worker failure: ENOENT…>'. Editor
problem-matchers (VS Code, vim :cfile) parse the line correctly, but
the angle brackets look like template placeholders in the UI and may
confuse a quick visual read of the CI log. Use parentheses instead.
F-081 — CRLF -> LF normalisation was repeated in both computeAnchorSet
(line 107-108) and processFile (line 223-224). Extract a single
readMarkdownNormalized(file) helper and call it from both sites. Tiny
duplication, but it was introduced in one commit and the intent reads
clearer with a named helper.
@ottobolyos ottobolyos marked this pull request as ready for review June 1, 2026 20:26
ottobolyos added a commit to ottobolyos/mtconnect.net that referenced this pull request Jun 1, 2026
ottobolyos added a commit to ottobolyos/mtconnect.net that referenced this pull request Jun 1, 2026
ottobolyos added a commit to ottobolyos/mtconnect.net that referenced this pull request Jun 1, 2026
@PatrickRitchie PatrickRitchie moved this from In Progress to Reviewing in MTConnect.NET-Development Jun 1, 2026
@PatrickRitchie PatrickRitchie merged commit bf3e755 into TrakHound:master Jun 1, 2026
7 checks passed
@github-project-automation github-project-automation Bot moved this from Reviewing to Done in MTConnect.NET-Development Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants