feat: add zero_division parameter to F1 metric by YousefZahran1 · Pull Request #753 · huggingface/evaluate

YousefZahran1 · 2026-04-30T21:13:06Z

What

precision and recall both accept a zero_division parameter that controls what value is returned when a label has no predicted or true samples. f1 does not, which causes:

A misleading user experience — sklearn's UndefinedMetricWarning for F1 explicitly tells users to pass zero_division, but the evaluate wrapper silently drops it with a TypeError.
An inconsistency within the same library (precision and recall have it; f1 doesn't).

Reproduce the bug

import evaluate
f1 = evaluate.load("f1")
# sklearn's warning on this call says: "Use zero_division parameter to control this behavior."
f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
           average=None, labels=[0,1,2,3], zero_division=0)
# TypeError: F1._compute() got an unexpected keyword argument 'zero_division'

Fix

Add zero_division="warn" to F1._compute() (matching the default in precision and recall) and pass it through to sklearn.metrics.f1_score.

Changes

metrics/f1/f1.py: add zero_division argument to _compute(), document it in _KWARGS_DESCRIPTION, and add Example 6 showing the parameter in action.

Testing

f1 = evaluate.load("f1")

# zero_division=0: undefined labels get 0
result = f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
                    average=None, labels=[0,1,2,3], zero_division=0)
# {'f1': [0.57, 0.0, 0.0, 0.0]}

# zero_division=1: undefined labels get 1
result = f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
                    average=None, labels=[0,1,2,3], zero_division=1)
# {'f1': [0.57, 0.0, 0.0, 1.0]}

# default (no zero_division): backward-compatible — still warns and returns 0
result = f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
                    average=None, labels=[0,1,2,3])
# UndefinedMetricWarning raised, {'f1': [0.57, 0.0, 0.0, 0.0]}

Fixes #699

sklearn.metrics.f1_score supports zero_division to control the value returned when a label has no predicted or true samples (UndefinedMetricWarning case). The evaluate F1 metric did not expose this argument, causing a TypeError for callers who tried to pass it — even though sklearn's own warning message tells them to do exactly that. precision and recall already accept zero_division; this brings F1 into parity. Default value is 'warn' to preserve backward compatibility. Adds Example 6 to _KWARGS_DESCRIPTION demonstrating the parameter. Fixes huggingface#699

TimRepke

This is awesome, thanks for addressing this old issue! Given long revision cycles and the backlog, it might be better to use **kwargs for all metrics that pass straight to sklearn to reduce redundancies and allow users to pass anything they like, even when sklearn updates and the evaluate package misses changes to the parameters. Totally optional.

TimRepke · 2026-05-07T15:10:32Z

+    def _compute(
+        self,
+        predictions,
+        references,
+        labels=None,
+        pos_label=1,
+        average="binary",
+        sample_weight=None,
+        zero_division="warn",
+    ):
        score = f1_score(
-            references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight
+            references,
+            predictions,
+            labels=labels,
+            pos_label=pos_label,
+            average=average,
+            sample_weight=sample_weight,
+            zero_division=zero_division,
        )


Given there's no typing anyway, specifying defaults and argument keywords is redundant. Unless the evaluate package has other defaults, it might be even more future proof to just pass kwargs:

Suggested change

def _compute(

self,

predictions,

references,

labels=None,

pos_label=1,

average="binary",

sample_weight=None,

zero_division="warn",

):

score = f1_score(

references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight

references,

predictions,

labels=labels,

pos_label=pos_label,

average=average,

sample_weight=sample_weight,

zero_division=zero_division,

)

def _compute(

self,

predictions,

references,

labels=None,

**kwargs,

):

score = f1_score(

references,

predictions,

labels=labels,

**kwargs,

)

YousefZahran1 · 2026-05-07T15:31:32Z

Thanks for the review, Tim! The **kwargs approach makes sense — cleaner and won't need updating if sklearn adds new parameters. I've pushed the change. Also kept the zero_division example in the docstring so users know the parameter exists without having to read sklearn docs.

TimRepke reviewed May 7, 2026

View reviewed changes

refactor: use **kwargs in _compute to pass all args to sklearn

5811e9a

YousefZahran1 mentioned this pull request May 7, 2026

EvaluationModuleError not exported from public API — sklearn errors leak to users #758

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add zero_division parameter to F1 metric#753

feat: add zero_division parameter to F1 metric#753
YousefZahran1 wants to merge 2 commits into
huggingface:mainfrom
YousefZahran1:youssef/fix-f1-zero-division

YousefZahran1 commented Apr 30, 2026

Uh oh!

TimRepke left a comment

Uh oh!

TimRepke May 7, 2026

Uh oh!

YousefZahran1 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YousefZahran1 commented Apr 30, 2026

What

Reproduce the bug

Fix

Changes

Testing

Uh oh!

TimRepke left a comment

Choose a reason for hiding this comment

Uh oh!

TimRepke May 7, 2026

Choose a reason for hiding this comment

Uh oh!

YousefZahran1 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants