fix: decode str_tokens per token for transformers >= 5 by robbiebusinessacc · Pull Request #177 · EleutherAI/delphi

robbiebusinessacc · 2026-06-10T22:06:47Z

Fixes #176

On transformers >= 5, batch_decode treats a 1-D tensor as a single
sequence and returns one joined string instead of per-token strings.
Delphi calls it on 1-D (ctx_len,) token tensors when building
str_tokens, so on a fresh install (transformers is unpinned) token
highlighting silently disappears from explainer and fuzz/detection
prompts, the simulator's ActivationRecord gets a length-1 token list,
and the intruder scorer raises IndexError.

Changes

Add decode_per_token to delphi.utils: decodes a 1-D tensor of
token ids into one string per token via
batch_decode(tokens.unsqueeze(-1)). This restores the transformers 4.x
behavior exactly and works on both 4.x and 5.x. As noted in str_tokens is a single joined string (not per-token) on transformers 5.x — token highlighting silently broken in explainer and classifier scorers #176,
convert_ids_to_tokens is not a substitute since it leaks BPE artifacts
like Ġworld.
Use it at the four affected sites: latents/samplers.py (train and
test examples), latents/constructors.py
(prepare_non_activating_examples), and
scorers/simulator/simulation/oai_simulator.py. The two sites in
constructors.py that immediately "".join(...) the result still
produce the intended text, so they are left unchanged.
Add tests/test_utils.py asserting per-token output and that the
decoded strings round-trip to the original text. The per-token assertion
fails on transformers 5.x without the fix and passes with it. Behavior
on 4.x is unchanged — verified element-wise against the old
batch_decode output on both BPE (pythia-70m) and WordPiece
(bert-base-uncased) tokenizers, including special tokens.

One possible follow-up I left out of scope: tests/client_test.py
builds str_tokens with the same 1-D batch_decode pattern, but it is
a manual vLLM script rather than a collected test.

Credit to @LakeSJS for the diagnosis in #176.

On transformers >= 5, batch_decode treats a 1-D tensor as a single sequence and returns one joined string instead of per-token strings. This silently broke token highlighting in the explainer and classifier scorers and raised IndexError in the intruder scorer. Add decode_per_token to delphi.utils and use it at the four sites that need per-token strings (samplers, non-activating constructor, OpenAI simulator). Fixes EleutherAI#176

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: decode str_tokens per token for transformers >= 5#177

fix: decode str_tokens per token for transformers >= 5#177
robbiebusinessacc wants to merge 1 commit into
EleutherAI:mainfrom
robbiebusinessacc:contrib/per-token-decode

robbiebusinessacc commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

robbiebusinessacc commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant