Skip to content

Commit 80faf8f

Browse files
committed
docs: add Granite Guardian page, update benchmarks and ztoken, fix broken link
- Add comprehensive Granite Guardian reference page covering Go API, CLI, REST API, guardrails middleware, and all 13 risk categories - Add SentencePiece unigram support note to ztoken ecosystem page - Add Mistral output quality note to benchmarks (pending tokenizer fix) - Fix broken /docs/getting-started/gpu-setup link to /docs/architecture/gpu-setup
1 parent e1a539a commit 80faf8f

File tree

4 files changed

+390
-3
lines changed

4 files changed

+390
-3
lines changed

content/docs/ecosystem/ztoken.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,10 @@ tok := ztoken.NewBPETokenizer(vocab, merges, special, false)
9090
tok.SetSentencePiece(true)
9191
```
9292

93+
## SentencePiece Unigram Support
94+
95+
As of v0.3.0, ztoken supports SentencePiece unigram tokenization in addition to BPE. Unigram models (used by T5, mBART, and some multilingual models) are detected automatically when loading from HuggingFace JSON or GGUF files with `tokenizer.ggml.model = "llama"` and a unigram vocabulary.
96+
9397
## Supported Models
9498

9599
ztoken is compatible with tokenizers from:

content/docs/getting-started/quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -301,7 +301,7 @@ fmt.Printf("Tokens: %d, Duration: %s\n", result.TokenCount, result.Duration)
301301
## Next Steps
302302

303303
- [Installation](/docs/getting-started/installation) -- detailed installation and platform support
304-
- [GPU Setup](/docs/getting-started/gpu-setup) -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
304+
- [GPU Setup](/docs/architecture/gpu-setup) -- configure CUDA, ROCm, or OpenCL for hardware-accelerated inference
305305
- [API Server](/docs/deployment) -- serve models behind an OpenAI-compatible HTTP API
306306
- [API Reference](/docs/api) -- full API documentation
307307
- [Tutorials](/docs/tutorials) -- step-by-step guides for common tasks

content/docs/reference/benchmarks.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,14 @@ Ollama v0.17.7.
4848
| Mistral 7B Q4_K_M | mistral | 7B | **44** | 46.77 | **0.94x** | ~Even |
4949

5050
Zerfoo wins on small models (1B-1.5B). Llama 3.2 3B is at parity. Mistral 7B
51-
was previously at 11 tok/s due to a performance regression; after the fix it
52-
runs at 44 tok/s (0.94x Ollama -- near parity).
51+
was previously at 11 tok/s due to a performance regression; after the shared
52+
memory fix it runs at 44 tok/s (0.94x Ollama -- near parity).
53+
54+
> **Note on Mistral output quality:** Mistral 7B throughput is correct at 44
55+
> tok/s, but output quality is pending a tokenizer fix. The Mistral tokenizer
56+
> requires SentencePiece byte-fallback handling that is not yet fully
57+
> implemented. Throughput numbers are valid; text coherence will improve once
58+
> the tokenizer fix lands.
5359
Additional architectures (Qwen, Phi, Mixtral, Command-R, Falcon, Mamba, RWKV)
5460
will be added as GGUF files are acquired and parser compatibility is resolved.
5561

0 commit comments

Comments
 (0)