Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
63d342d
Add batch processing server
abetlen Apr 5, 2026
6084d8c
Improve response parser streaming performance
abetlen Apr 5, 2026
c3ae090
Add /v1/models endpoint
abetlen Apr 5, 2026
03805b3
Support custom tools in responses API
abetlen Apr 5, 2026
b61cadf
Improve responses api compatibility for codex
abetlen Apr 8, 2026
8cb7616
Improve batch server prompt and context config
abetlen Apr 9, 2026
bdfe69f
Improve batch server scheduling and prompt handling
abetlen Apr 11, 2026
60daf05
fix apply_patch tool
abetlen Apr 14, 2026
5abc061
Improve batch server schemas and metrics
abetlen Apr 26, 2026
9230862
Refactor sequence cache helpers
abetlen May 15, 2026
de99e97
Fix server type diagnostics
abetlen May 16, 2026
4cac5cd
feat: add llama.cpp extension bindings
abetlen May 22, 2026
ade3936
feat: add MTP support to batch server
abetlen May 22, 2026
e63d87d
feat: improve draft-mtp handling in batch server
abetlen May 31, 2026
5d626f4
Merge remote-tracking branch 'origin/main' into abetlen/batch-process…
abetlen Jun 2, 2026
d470b26
feat: cap MTP draft context outputs
abetlen Jun 2, 2026
5e82af5
fix: preserve held streaming tokens
abetlen Jun 3, 2026
c776e6a
Merge remote-tracking branch 'origin/main' into abetlen/batch-process…
abetlen Jun 3, 2026
4cb7a2c
feat: add load-time LoRA support to batch server
abetlen Jun 3, 2026
26a4d79
Merge remote-tracking branch 'origin/main' into abetlen/batch-process…
abetlen Jun 4, 2026
b988613
Merge remote-tracking branch 'origin/main' into abetlen/batch-process…
abetlen Jun 5, 2026
3df792d
feat: add multimodal support to batch server
abetlen Jun 5, 2026
0d4b0be
Merge remote-tracking branch 'origin/main' into abetlen/batch-process…
abetlen Jun 6, 2026
bd702c8
refactor: rename batch item kinds
abetlen Jun 6, 2026
d9bcf9f
refactor: type sampled mtp updates
abetlen Jun 6, 2026
b2cd053
refactor: structure sampled mtp batch processing
abetlen Jun 6, 2026
ac0ba61
refactor: clarify batch item construction
abetlen Jun 6, 2026
01d8c24
refactor: type batch item kind
abetlen Jun 6, 2026
409b10e
refactor: clarify sampled pending index
abetlen Jun 6, 2026
6bf52c6
refactor: clarify output index naming
abetlen Jun 6, 2026
8272f4e
refactor: rename logits index resolver
abetlen Jun 6, 2026
0209e00
refactor: colocate sampled mtp state
abetlen Jun 6, 2026
283ac9c
refactor: inline sampled mtp helpers
abetlen Jun 6, 2026
9e3a655
refactor: use row-expanded multimodal prompt identity
abetlen Jun 6, 2026
d979c3c
test: remove multimodal prompt plan tests
abetlen Jun 7, 2026
5dd6bc5
refactor: narrow mtmd processor dependencies
abetlen Jun 7, 2026
6bc9fef
refactor: group prompt segment media fields
abetlen Jun 7, 2026
d062a93
refactor: centralize sequence state copy
abetlen Jun 7, 2026
b33bed3
refactor: keep disk cache storage only
abetlen Jun 7, 2026
d0c6709
refactor: split batch item payloads
abetlen Jun 7, 2026
53d553b
refactor: centralize pending request failure cleanup
abetlen Jun 7, 2026
5f4c8fd
refactor: centralize sequence claiming
abetlen Jun 7, 2026
23a4915
refactor: key sequence disk cache compatibility
abetlen Jun 7, 2026
d117b69
refactor: decouple completion request preparation
abetlen Jun 7, 2026
79d9f71
refactor: name prepared completion parts
abetlen Jun 7, 2026
7b1782a
refactor: return prepared completion parts
abetlen Jun 7, 2026
9e7f2b9
refactor: localize media cache key building
abetlen Jun 7, 2026
d784dac
refactor: remove unused request id override
abetlen Jun 7, 2026
db118ef
refactor: simplify prompt segment row capacity
abetlen Jun 7, 2026
9744bcb
refactor: inline prompt row clamp
abetlen Jun 7, 2026
57685c2
refactor: inline disconnect cancellation response
abetlen Jun 7, 2026
8d96870
refactor: simplify recurrent draft capacity
abetlen Jun 7, 2026
54ae278
refactor: define builtin grammar rule as dataclass
abetlen Jun 7, 2026
ec014f3
refactor: type chat template conversions
abetlen Jun 7, 2026
8a002b7
docs: mark llama_cpp_ext experimental
abetlen Jun 7, 2026
99056b1
feat: restrict multimodal media sources
abetlen Jun 7, 2026
b05b08d
docs: add server example README and config
abetlen Jun 7, 2026
4714849
docs: document server example configuration
abetlen Jun 7, 2026
fc7326c
docs: update server README
abetlen Jun 7, 2026
10219a1
docs: document server wheel setup and clients
abetlen Jun 7, 2026
e2f079e
docs: add server model configs
abetlen Jun 7, 2026
cbc0fa0
docs: add server chat templates and response schemas
abetlen Jun 7, 2026
ddd762f
docs: keep batch processing server example
abetlen Jun 7, 2026
779e107
docs: add server example changelog entry
abetlen Jun 7, 2026
c3a3352
docs: mention multi-token prediction in changelog
abetlen Jun 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

- feat: update llama.cpp to ggml-org/llama.cpp@5a69c9743
- feat(example): Updated server example (batch processing, multi-token prediction, `/v1/responses` api, response parsing) by @abetlen in #2174

## [0.3.26]

Expand Down
Loading
Loading