fix(docs): add language tags to 22 bare code fences across 13 pages

dndungu · dndungu · commit 4aee7aae7d24 · 2026-03-26T09:49:46.000-07:00
diff --git a/content/docs/api/serve.md b/content/docs/api/serve.md
@@ -6,7 +6,7 @@ bookToc: true
 
 # Package serve
 
-```
+```go
 import "github.com/zerfoo/zerfoo/serve"
 ```
 
diff --git a/content/docs/architecture/gpu-setup.md b/content/docs/architecture/gpu-setup.md
@@ -32,7 +32,7 @@ This means:
 
 The detection flow:
 
-```
+```text
 1. dlopen("libcudart.so.12") or dlopen("libcudart.so")
 2. dlsym each required symbol (cudaMalloc, cudaFree, cudaMemcpy, ...)
 3. Optionally resolve CUDA graph symbols (cudaStreamBeginCapture, ...)
@@ -55,7 +55,7 @@ nvidia-smi
 
 Expected output shows your GPU model, driver version, and CUDA version:
 
-```
+```text
 +-----------------------------------------------------------------------------------------+
 | NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
 |-----------------------------------------+------------------------+----------------------+
@@ -182,7 +182,7 @@ rocm-smi
 
 Expected output:
 
-```
+```text
 ========================= ROCm System Management Interface =========================
 ================================ Concise Info =======================================
 GPU  Temp   AvgPwr  SCLK    MCLK    Fan   Perf  PwrCap  VRAM%  GPU%
diff --git a/content/docs/architecture/overview.md b/content/docs/architecture/overview.md
@@ -168,7 +168,7 @@ statement (with special-case detection -- e.g., Mistral models report
 Most decoder-only architectures share the same transformer body. The shared
 logic lives in `buildTransformerGraph()`, which constructs:
 
-```
+```text
 Embed -> [RMSNorm -> GQA -> Add -> RMSNorm -> FFN(SiLU-gate) -> Add] x N -> RMSNorm -> LMHead
 ```
 
@@ -433,7 +433,7 @@ flowchart TD
 
 A quick reference for where to find things:
 
-```
+```text
 zerfoo/
   cmd/                    CLI entry points (run, serve, pull, predict, tokenize)
   inference/
diff --git a/content/docs/blog/03-architecture-deep-dive.md b/content/docs/blog/03-architecture-deep-dive.md
@@ -12,7 +12,7 @@ Zerfoo runs LLM inference in Go at 245 tokens/second — 20% faster than Ollama.
 
 When you call `zerfoo.Load("google/gemma-3-4b")` followed by `m.Chat("Hello")`, the following pipeline executes:
 
-```
+```text
 GGUF file on disk
   -> Parse GGUF header + tensors
   -> Map tensor names to canonical form
@@ -86,7 +86,7 @@ func init() {
 
 The `general.architecture` field in the GGUF metadata determines which builder is invoked. Most decoder-only architectures share the same transformer body through `buildTransformerGraph()`, which constructs:
 
-```
+```text
 Embed -> [RMSNorm -> GQA -> Add -> RMSNorm -> FFN(SiLU-gate) -> Add] x N -> RMSNorm -> LMHead
 ```
 
diff --git a/content/docs/blog/how-we-beat-ollama-cuda-graph-capture.md b/content/docs/blog/how-we-beat-ollama-cuda-graph-capture.md
@@ -38,7 +38,7 @@ This is a hard problem for a real inference pipeline. Token embeddings require l
 
 Zerfoo solves this by splitting the execution plan into three regions:
 
-```
+```text
 [Pre-capture: CPU-touching ops]  [Capture region: GPU-only ops]  [Post-capture: CPU-touching ops]
 ```
 
diff --git a/content/docs/contributing/overview.md b/content/docs/contributing/overview.md
@@ -27,7 +27,7 @@ Zerfoo is an ecosystem of six independent repositories (each with its own `go.mo
 
 **Dependency graph:**
 
-```
+```text
 float16 --+
 float8  --+--> ztensor --> zerfoo
 ztoken --+
@@ -96,7 +96,7 @@ go tool cover -html=coverage.out -o coverage.html
 
 We use [Conventional Commits](https://www.conventionalcommits.org/) for automated versioning with release-please.
 
-```
+```text
 <type>(<scope>): <description>
 ```
 
@@ -112,7 +112,7 @@ We use [Conventional Commits](https://www.conventionalcommits.org/) for automate
 
 Examples:
 
-```
+```text
 feat(inference): add Qwen 2.5 architecture support
 fix(generate): correct KV cache eviction for sliding window attention
 perf(layers): fuse SiLU and gate projection into single kernel
diff --git a/content/docs/deployment/enterprise.md b/content/docs/deployment/enterprise.md
@@ -1027,7 +1027,7 @@ readinessProbe:
 
 Zerfoo logs every request with structured fields:
 
-```
+```text
 method=POST path=/v1/chat/completions model=gemma-3-1b prompt_tokens=0 completion_tokens=0 latency_ms=142 status_code=200
 ```
 
@@ -1093,7 +1093,7 @@ sidecar. SHA-256 is computed and stored on upload.
 
 #### Directory Layout
 
-```
+```text
 /models/
   llama-3-7b-q4_k_m/
     model.gguf
@@ -1176,7 +1176,7 @@ a new load would exceed the budget.
 
 ### Architecture
 
-```
+```text
 Request -> ModelManager.Get("model-id")
               |
               +-- Already loaded? -> promote to MRU, return handle
diff --git a/content/docs/ecosystem/_index.md b/content/docs/ecosystem/_index.md
@@ -11,7 +11,7 @@ Zerfoo is a family of Go modules that together form a complete ML inference and
 
 ## Dependency Graph
 
-```
+```text
 float16 ──┐
            ├──► ztensor ──► zerfoo
 float8  ──┘                    ▲
diff --git a/content/docs/getting-started/first-inference.md b/content/docs/getting-started/first-inference.md
@@ -80,7 +80,7 @@ Start a chat session with `zerfoo run`:
 zerfoo run gemma-3-1b-q4
 ```
 
-```
+```text
 Model loaded. Type your message (Ctrl-D to quit).
 
 > What is the capital of France?
diff --git a/content/docs/getting-started/quickstart.md b/content/docs/getting-started/quickstart.md
@@ -55,7 +55,7 @@ go run main.go
 
 To request a specific quantization, append it to the ID:
 
-```
+```text
 google/gemma-3-4b/Q8_0
 ```
 
@@ -107,7 +107,7 @@ zerfoo run gemma-3-1b-q4
 
 This starts an interactive chat session:
 
-```
+```text
 Model loaded. Type your message (Ctrl-D to quit).
 
 > What is the capital of France?
diff --git a/content/docs/reference/extensions.md b/content/docs/reference/extensions.md
@@ -27,7 +27,7 @@ so registration errors surface immediately at program startup.
 
 Extension packages should follow the naming pattern:
 
-```
+```text
 github.com/<user>/zerfoo-ext-<name>
 ```
 
diff --git a/content/docs/zonnx/onnx-to-gguf.md b/content/docs/zonnx/onnx-to-gguf.md
@@ -36,7 +36,7 @@ The `--api-key` flag takes precedence over the `HF_API_KEY` environment variable
 
 After downloading, you should have at minimum:
 
-```
+```text
 models/
   model.onnx
   config.json        # optional but recommended for metadata
diff --git a/content/docs/zonnx/safetensors-to-gguf.md b/content/docs/zonnx/safetensors-to-gguf.md
@@ -17,7 +17,7 @@ This guide covers converting SafeTensors models (typically BERT and RoBERTa) to
 
 zonnx expects a directory as input for SafeTensors conversion. The directory must contain:
 
-```
+```text
 model-dir/
   config.json           # required -- model configuration
   model.safetensors     # required -- model weights