feat: add zero_division parameter to F1 metric#753
Conversation
sklearn.metrics.f1_score supports zero_division to control the value returned when a label has no predicted or true samples (UndefinedMetricWarning case). The evaluate F1 metric did not expose this argument, causing a TypeError for callers who tried to pass it — even though sklearn's own warning message tells them to do exactly that. precision and recall already accept zero_division; this brings F1 into parity. Default value is 'warn' to preserve backward compatibility. Adds Example 6 to _KWARGS_DESCRIPTION demonstrating the parameter. Fixes huggingface#699
TimRepke
left a comment
There was a problem hiding this comment.
This is awesome, thanks for addressing this old issue! Given long revision cycles and the backlog, it might be better to use **kwargs for all metrics that pass straight to sklearn to reduce redundancies and allow users to pass anything they like, even when sklearn updates and the evaluate package misses changes to the parameters. Totally optional.
| def _compute( | ||
| self, | ||
| predictions, | ||
| references, | ||
| labels=None, | ||
| pos_label=1, | ||
| average="binary", | ||
| sample_weight=None, | ||
| zero_division="warn", | ||
| ): | ||
| score = f1_score( | ||
| references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight | ||
| references, | ||
| predictions, | ||
| labels=labels, | ||
| pos_label=pos_label, | ||
| average=average, | ||
| sample_weight=sample_weight, | ||
| zero_division=zero_division, | ||
| ) |
There was a problem hiding this comment.
Given there's no typing anyway, specifying defaults and argument keywords is redundant. Unless the evaluate package has other defaults, it might be even more future proof to just pass kwargs:
| def _compute( | |
| self, | |
| predictions, | |
| references, | |
| labels=None, | |
| pos_label=1, | |
| average="binary", | |
| sample_weight=None, | |
| zero_division="warn", | |
| ): | |
| score = f1_score( | |
| references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight | |
| references, | |
| predictions, | |
| labels=labels, | |
| pos_label=pos_label, | |
| average=average, | |
| sample_weight=sample_weight, | |
| zero_division=zero_division, | |
| ) | |
| def _compute( | |
| self, | |
| predictions, | |
| references, | |
| labels=None, | |
| **kwargs, | |
| ): | |
| score = f1_score( | |
| references, | |
| predictions, | |
| labels=labels, | |
| **kwargs, | |
| ) |
|
Thanks for the review, Tim! The |
What
precisionandrecallboth accept azero_divisionparameter that controls what value is returned when a label has no predicted or true samples.f1does not, which causes:UndefinedMetricWarningfor F1 explicitly tells users to passzero_division, but the evaluate wrapper silently drops it with aTypeError.precisionandrecallhave it;f1doesn't).Reproduce the bug
Fix
Add
zero_division="warn"toF1._compute()(matching the default inprecisionandrecall) and pass it through tosklearn.metrics.f1_score.Changes
metrics/f1/f1.py: addzero_divisionargument to_compute(), document it in_KWARGS_DESCRIPTION, and add Example 6 showing the parameter in action.Testing
Fixes #699