Skip to content

fix(providers): provider-aware token counting#255

Merged
mpekatsoula merged 1 commit into
mainfrom
fix_token_count
Jun 22, 2026
Merged

fix(providers): provider-aware token counting#255
mpekatsoula merged 1 commit into
mainfrom
fix_token_count

Conversation

@mk-arm

@mk-arm mk-arm commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Token counting was hardcoded to tiktoken/gpt-4 regardless of the configured provider, so Anthropic (direct, Bedrock, Bedrock Mantle), Gemini, and local-model providers all chunked and accounted using the OpenAI tokenizer.

ChatProvider now exposes count_tokens() with a chars/4 default heuristic. OpenAI/Azure override with tiktoken; Anthropic and Bedrock Mantle use a 3.5 chars/token heuristic; Bedrock, Ollama, vLLM and llama.cpp dispatch on the configured model id against a per-family ratio table (Llama 2/3, Mistral/Mixtral, Qwen, DeepSeek, Gemma/Gemini, Phi, Cohere, Titan) sourced from the Llama 3 paper, Mistral docs, Qwen2 tech report, Google token docs and HF tokenizer configs.

split_snippet and process_diff_file accept a token_counter callable; ReviewGraph and ReviewService pass llm_provider.count_tokens. The embedding-usage callback keeps the model-name dispatch in utils.count_tokens since it only sees a model id string.

Token counting was hardcoded to tiktoken/gpt-4 regardless of the
configured provider, so Anthropic (direct, Bedrock, Bedrock Mantle),
Gemini, and local-model providers all chunked and accounted using the
OpenAI tokenizer.

ChatProvider now exposes count_tokens() with a chars/4 default
heuristic. OpenAI/Azure override with tiktoken; Anthropic and Bedrock
Mantle use a 3.5 chars/token heuristic; Bedrock, Ollama, vLLM and
llama.cpp dispatch on the configured model id against a per-family
ratio table (Llama 2/3, Mistral/Mixtral, Qwen, DeepSeek, Gemma/Gemini,
Phi, Cohere, Titan) sourced from the Llama 3 paper, Mistral docs,
Qwen2 tech report, Google token docs and HF tokenizer configs.

split_snippet and process_diff_file accept a token_counter callable;
ReviewGraph and ReviewService pass llm_provider.count_tokens. The
embedding-usage callback keeps the model-name dispatch in
utils.count_tokens since it only sees a model id string.
@mpekatsoula mpekatsoula merged commit 9d7926a into main Jun 22, 2026
7 checks passed
@mk-arm mk-arm deleted the fix_token_count branch June 23, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants