Skip to content

fix: raise clear ValueError when add_batch receives None inputs#757

Open
YousefZahran1 wants to merge 1 commit into
huggingface:mainfrom
YousefZahran1:youssef/fix-add-batch-none-input
Open

fix: raise clear ValueError when add_batch receives None inputs#757
YousefZahran1 wants to merge 1 commit into
huggingface:mainfrom
YousefZahran1:youssef/fix-add-batch-none-input

Conversation

@YousefZahran1
Copy link
Copy Markdown

What

add_batch() crashed with a cryptic TypeError: object of type 'NoneType' has no len() when predictions or references was None. The error originated at line 518 (if len(column) > 0), was caught by the except (pa.ArrowInvalid, TypeError) handler, then immediately crashed again at line 523 (len(batch[c])) — producing a second, equally confusing traceback.

Why

This affects anyone following the HuggingFace Trainer tutorial whose compute_metrics function conditionally returns None (e.g. when evaluation data is partially unavailable). The root error message gives no indication which field was None or why.

Reproduce

import evaluate
acc = evaluate.load("accuracy")
acc.add_batch(predictions=None, references=[0, 1])
# Before fix → TypeError: object of type 'NoneType' has no len()
# After fix  → ValueError: Batch inputs contain None for the following fields: ['predictions']. ...

Fix

Validate the batch dict immediately after construction and raise a descriptive ValueError that names the offending fields and tells the user to check their compute_metrics return value. The change is 6 lines.

Testing

import evaluate
acc = evaluate.load("accuracy")
try:
    acc.add_batch(predictions=None, references=[0, 1])
except ValueError as e:
    print(e)
# Batch inputs contain None for the following fields: ['predictions'].
# All inputs must be non-None sequences (lists, arrays, or tensors).
# Check that your compute_metrics function passes non-None predictions and references.

Fixes #668

Previously, passing None as predictions or references to add_batch()
caused a cryptic 'TypeError: object of type NoneType has no len()' deep
inside the Arrow serialisation path (module.py:518), making it hard to
diagnose. The TypeError was re-raised from inside the except handler
(line 523), which also called len() on the None value and crashed again.

Fix: validate the batch dict immediately after construction and raise a
descriptive ValueError that names the offending fields and directs the
user to check their compute_metrics function.

Fixes huggingface#668
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError: object of type 'NoneType' has no len()

1 participant