Skip to content

feat: add zero_division parameter to F1 metric#753

Open
YousefZahran1 wants to merge 2 commits into
huggingface:mainfrom
YousefZahran1:youssef/fix-f1-zero-division
Open

feat: add zero_division parameter to F1 metric#753
YousefZahran1 wants to merge 2 commits into
huggingface:mainfrom
YousefZahran1:youssef/fix-f1-zero-division

Conversation

@YousefZahran1
Copy link
Copy Markdown

What

precision and recall both accept a zero_division parameter that controls what value is returned when a label has no predicted or true samples. f1 does not, which causes:

  1. A misleading user experience — sklearn's UndefinedMetricWarning for F1 explicitly tells users to pass zero_division, but the evaluate wrapper silently drops it with a TypeError.
  2. An inconsistency within the same library (precision and recall have it; f1 doesn't).

Reproduce the bug

import evaluate
f1 = evaluate.load("f1")
# sklearn's warning on this call says: "Use zero_division parameter to control this behavior."
f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
           average=None, labels=[0,1,2,3], zero_division=0)
# TypeError: F1._compute() got an unexpected keyword argument 'zero_division'

Fix

Add zero_division="warn" to F1._compute() (matching the default in precision and recall) and pass it through to sklearn.metrics.f1_score.

Changes

  • metrics/f1/f1.py: add zero_division argument to _compute(), document it in _KWARGS_DESCRIPTION, and add Example 6 showing the parameter in action.

Testing

f1 = evaluate.load("f1")

# zero_division=0: undefined labels get 0
result = f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
                    average=None, labels=[0,1,2,3], zero_division=0)
# {'f1': [0.57, 0.0, 0.0, 0.0]}

# zero_division=1: undefined labels get 1
result = f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
                    average=None, labels=[0,1,2,3], zero_division=1)
# {'f1': [0.57, 0.0, 0.0, 1.0]}

# default (no zero_division): backward-compatible — still warns and returns 0
result = f1.compute(predictions=[0,0,0,0,0], references=[0,1,0,1,2],
                    average=None, labels=[0,1,2,3])
# UndefinedMetricWarning raised, {'f1': [0.57, 0.0, 0.0, 0.0]}

Fixes #699

sklearn.metrics.f1_score supports zero_division to control the value
returned when a label has no predicted or true samples (UndefinedMetricWarning
case). The evaluate F1 metric did not expose this argument, causing a
TypeError for callers who tried to pass it — even though sklearn's own
warning message tells them to do exactly that.

precision and recall already accept zero_division; this brings F1 into
parity. Default value is 'warn' to preserve backward compatibility.

Adds Example 6 to _KWARGS_DESCRIPTION demonstrating the parameter.

Fixes huggingface#699
Copy link
Copy Markdown

@TimRepke TimRepke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thanks for addressing this old issue! Given long revision cycles and the backlog, it might be better to use **kwargs for all metrics that pass straight to sklearn to reduce redundancies and allow users to pass anything they like, even when sklearn updates and the evaluate package misses changes to the parameters. Totally optional.

Comment thread metrics/f1/f1.py
Comment on lines +138 to 156
def _compute(
self,
predictions,
references,
labels=None,
pos_label=1,
average="binary",
sample_weight=None,
zero_division="warn",
):
score = f1_score(
references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight
references,
predictions,
labels=labels,
pos_label=pos_label,
average=average,
sample_weight=sample_weight,
zero_division=zero_division,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there's no typing anyway, specifying defaults and argument keywords is redundant. Unless the evaluate package has other defaults, it might be even more future proof to just pass kwargs:

Suggested change
def _compute(
self,
predictions,
references,
labels=None,
pos_label=1,
average="binary",
sample_weight=None,
zero_division="warn",
):
score = f1_score(
references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight
references,
predictions,
labels=labels,
pos_label=pos_label,
average=average,
sample_weight=sample_weight,
zero_division=zero_division,
)
def _compute(
self,
predictions,
references,
labels=None,
**kwargs,
):
score = f1_score(
references,
predictions,
labels=labels,
**kwargs,
)

@YousefZahran1
Copy link
Copy Markdown
Author

Thanks for the review, Tim! The **kwargs approach makes sense — cleaner and won't need updating if sklearn adds new parameters. I've pushed the change. Also kept the zero_division example in the docstring so users know the parameter exists without having to read sklearn docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add zero_division argument to F1

2 participants