abetlen · abetlen · Jun 7, 2026 · Jun 7, 2026 · Jun 7, 2026 · Jun 7, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+- feat(example): align server MTP support with llama.cpp by @abetlen in #2283
 - feat: update llama.cpp to ggml-org/llama.cpp@9e3b928fd
 - feat(example): add OpenAI-compatible embeddings endpoint by @abetlen in #2281
 

diff --git a/examples/server/README.md b/examples/server/README.md
@@ -434,6 +434,22 @@ Use MTP when the loaded model and llama.cpp build expose the required draft stat
 }
 ```
 
+By default `draft-mtp` creates the MTP context from the target model.
+Set `draft_model_path` or `draft_model_from_pretrained` when the model uses a separate assistant GGUF.
+
+```json
+{
+  "model": {
+    "draft_model": "draft-mtp",
+    "draft_model_num_pred_tokens": 2,
+    "draft_model_from_pretrained": {
+      "repo_id": "example/gemma-assistant-GGUF",
+      "filename": "assistant.gguf"
+    }
+  }
+}
+```
+
 MTP currently applies to text-only requests.
 
 ## Disk Sequence Cache