Skip to content

[Bug]: TestOfNoEffectAnalysis crashes with KeyError 'n' on standard Client data, and silently reports "no effect" for sem=0 or single-arm trials #5248

Description

@ricardoofnl

What happened?

TestOfNoEffectAnalysis (and the underlying functions in ax/utils/stats/no_effects.py) mishandles three inputs that are easy to produce through the public API:

  1. KeyError: 'n' on standard Client data. check_experiment_effects_per_metric reads dfm["n"], but n is not in Data.REQUIRED_COLUMNS and is never produced by the standard Client.complete_trial(...) flow. Computing the analysis on any such experiment crashes with a bare KeyError: 'n' instead of an informative error.

  2. sem == 0 silently flips the conclusion. With deterministic data (sem=0, common in BO settings), no_effect_test_welch divides by zero (ws = n / variances), yielding a NaN p-value. has_effect = bool(nan < alpha) is False, so the healthcheck warns "no effects have been detected" even when the exactly-known means clearly differ — the opposite of the correct conclusion. The only symptom is a numpy RuntimeWarning.

  3. Single-arm trials (K == 1) silently produce NaN. The Welch statistic divides by K - 1 == 0, again giving NaN → has_effect=False with no error, even though a one-arm group can't be tested at all.

Also minor: the check_experiment_effects docstring describes ineffective_on_objectives with the same sentence as effective ("can be rejected"), but it means the opposite.

Please provide a minimal, reproducible example of the unexpected behavior.

from ax import Client, RangeParameterConfig
from ax.analysis.healthcheck.no_effects_analysis import TestOfNoEffectAnalysis

client = Client()
client.configure_experiment(
    parameters=[RangeParameterConfig(name="x", parameter_type="float", bounds=(0, 1))]
)
client.configure_optimization(objective="y")
for x, y in [(0.1, 1.0), (0.5, 2.0), (0.9, 1.5)]:
    idx = client.attach_trial(parameters={"x": x})
    client.complete_trial(trial_index=idx, raw_data={"y": y})

TestOfNoEffectAnalysis().compute(experiment=client._experiment)  # KeyError: 'n'

For case 2 (silent wrong result):

import pandas as pd
from ax.core.data import Data
from ax.utils.stats.no_effects import check_experiment_effects_per_metric

data = Data(df=pd.DataFrame([
    {"trial_index": 0, "arm_name": "0_0", "metric_name": "m1",
     "metric_signature": "m1", "mean": 1.0, "sem": 0.0, "n": 100},
    {"trial_index": 0, "arm_name": "0_1", "metric_name": "m1",
     "metric_signature": "m1", "mean": 2.0, "sem": 0.0, "n": 100},
]))
out = check_experiment_effects_per_metric(data=data, objective_names={"m1"})
print(out)  # p_value=NaN, has_effect=False — but means 1.0 vs 2.0 with sem=0
            # is a maximally significant effect

Please paste any relevant traceback/logs produced by the example provided.

Traceback (most recent call last):
  ...
  File "ax/utils/stats/no_effects.py", line 62, in check_experiment_effects_per_metric
    ns=list(dfm["n"]),
        ...
KeyError: 'n'

# and for sem=0:
ax/utils/stats/no_effects.py:239: RuntimeWarning: divide by zero encountered in divide
ax/utils/stats/no_effects.py:246: RuntimeWarning: invalid value encountered in scalar divide

Ax Version

main (reproduced at d8eeb97)

Python Version

3.14

Operating System

Linux

(Optional) Describe any potential fixes you've considered to the issue outlined above.

  • Validate the n column up front in check_experiment_effects_per_metric / check_experiment_effects and raise a UserInputError with an actionable message.
  • Handle zero-variance arms explicitly in no_effect_test_welch: differing exactly-known means are an exact effect (p=0); identical all-deterministic means are no effect (p=1); a mixed zero/positive-sem group raises UserInputError since Welch's test is undefined there.
  • Skip single-arm (metric, trial) groups in check_experiment_effects_per_metric, validate K >= 2 and n > 1 in no_effect_test_welch, and raise a UserInputError from TestOfNoEffectAnalysis when no trial has two or more arms.

I have a patch with these changes plus tests ready and will open a PR.

Pull Request

Yes, opening one

Code of Conduct

  • I agree to follow Ax's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions