Skip to content

fix: decode str_tokens per token for transformers >= 5#177

Open
robbiebusinessacc wants to merge 1 commit into
EleutherAI:mainfrom
robbiebusinessacc:contrib/per-token-decode
Open

fix: decode str_tokens per token for transformers >= 5#177
robbiebusinessacc wants to merge 1 commit into
EleutherAI:mainfrom
robbiebusinessacc:contrib/per-token-decode

Conversation

@robbiebusinessacc

Copy link
Copy Markdown

Fixes #176

On transformers >= 5, batch_decode treats a 1-D tensor as a single
sequence and returns one joined string instead of per-token strings.
Delphi calls it on 1-D (ctx_len,) token tensors when building
str_tokens, so on a fresh install (transformers is unpinned) token
highlighting silently disappears from explainer and fuzz/detection
prompts, the simulator's ActivationRecord gets a length-1 token list,
and the intruder scorer raises IndexError.

Changes

  • Add decode_per_token to delphi.utils: decodes a 1-D tensor of
    token ids into one string per token via
    batch_decode(tokens.unsqueeze(-1)). This restores the transformers 4.x
    behavior exactly and works on both 4.x and 5.x. As noted in str_tokens is a single joined string (not per-token) on transformers 5.x — token highlighting silently broken in explainer and classifier scorers #176,
    convert_ids_to_tokens is not a substitute since it leaks BPE artifacts
    like Ġworld.
  • Use it at the four affected sites: latents/samplers.py (train and
    test examples), latents/constructors.py
    (prepare_non_activating_examples), and
    scorers/simulator/simulation/oai_simulator.py. The two sites in
    constructors.py that immediately "".join(...) the result still
    produce the intended text, so they are left unchanged.
  • Add tests/test_utils.py asserting per-token output and that the
    decoded strings round-trip to the original text. The per-token assertion
    fails on transformers 5.x without the fix and passes with it. Behavior
    on 4.x is unchanged — verified element-wise against the old
    batch_decode output on both BPE (pythia-70m) and WordPiece
    (bert-base-uncased) tokenizers, including special tokens.

One possible follow-up I left out of scope: tests/client_test.py
builds str_tokens with the same 1-D batch_decode pattern, but it is
a manual vLLM script rather than a collected test.

Credit to @LakeSJS for the diagnosis in #176.

On transformers >= 5, batch_decode treats a 1-D tensor as a single
sequence and returns one joined string instead of per-token strings.
This silently broke token highlighting in the explainer and classifier
scorers and raised IndexError in the intruder scorer.

Add decode_per_token to delphi.utils and use it at the four sites that
need per-token strings (samplers, non-activating constructor, OpenAI
simulator). Fixes EleutherAI#176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

str_tokens is a single joined string (not per-token) on transformers 5.x — token highlighting silently broken in explainer and classifier scorers

1 participant