Commit 5e424bf
Brendan Gray
fix: remove CPU context cap - CPU fallback now gets same RAM-based context as GPU
CPU mode artificially capped context at 8192, causing models that fell back to
CPU (e.g. 4B Q8 exceeding 4GB VRAM) to get crushed to ~1792 context via
autoContextSizeShrink. With 32GB RAM, a 4B model should get far more.
Changes:
- Remove 'if (mode === false) maxCtx = Math.min(maxCtx, 8192)' cap
- Equalize CPU contextMin to MIN_USABLE_GPU_CONTEXT (8192)
- Add diagnostic logging to _computeMaxContext()1 parent eae9034 commit 5e424bf
1 file changed
Lines changed: 5 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
322 | | - | |
323 | | - | |
324 | | - | |
| 322 | + | |
| 323 | + | |
325 | 324 | | |
326 | 325 | | |
327 | 326 | | |
| |||
499 | 498 | | |
500 | 499 | | |
501 | 500 | | |
502 | | - | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
503 | 504 | | |
504 | 505 | | |
505 | 506 | | |
| |||
0 commit comments