Skip to content

Add eval for tail stream equivalence checks #39

Description

@martinfrancois

Problem

Add a Java Streams skill eval for a real refactoring case where a stream was present but did not provide the short-circuiting benefit it appeared to provide.

The important lesson is not "always use streams". The eval should teach that a stream terminal operation such as allMatch(...) only short-circuits work that is inside the lazy stream pipeline. If expensive or meaningful work has already happened eagerly before the stream starts, the stream can make the code look more efficient than it is.

This is also a missed-trigger case. The Streams skill should have activated during the original implementation prompt because that prompt asked Codex to collect declarations, compare all remaining declarations, and fail closed on conflicts. The later maintainer correction should not be the first moment where the stream-short-circuiting guidance is applied.

Code before the prompt was executed

Before the change, the method selected only the first labeled declaration. It searched the combined card text, parsed the first match, and ignored any later declaration:

private RepositorySourceSelection explicitSource(Card card) {
    Matcher labeled = LABELED_SOURCE.matcher(cardText(card));
    if (!labeled.find()) {
        return RepositorySourceSelection.none();
    }
    return parse(labeled.group(2), RepositorySource.Origin.CARD, labelMode(labeled.group(1)));
}

That code was intentionally being changed because the product rule became stricter: all explicit declarations needed to be collected and conflicts had to fail closed.

Prompt that caused the eager-list implementation

The prompt asked the agent to repair the method so it would:

  • examine title, description, and comments as separate logical lines;
  • recognize explicit repository-source declarations;
  • collect every explicit declaration instead of stopping at the first;
  • accept multiple declarations only when every declaration is nonblank, valid, and normalizes to the same source kind, normalized source value, identity, and path;
  • return an invalid selected-source result for blank, malformed, or conflicting declarations;
  • suppress workflow defaults whenever an explicit declaration is blank, malformed, or conflicting.

The agent correctly moved from "first declaration wins" to "all declarations must agree", but produced an eager parsed-list implementation.

Later prompt that exposed the issue

The maintainer later provided the preferred implementation and specifically explained that the original code was weak because the stream appeared to benefit from short-circuiting after the expensive parsing work had already happened.

That later prompt was useful, but the eval should verify that the Streams skill activates before this correction is necessary.

Prompt-produced code before maintainer correction

private RepositorySourceSelection explicitSource(Card card) {
    List<Declaration> declarations = declarations(card);
    if (declarations.isEmpty()) {
        return RepositorySourceSelection.none();
    }
    if (declarations.size() == 1) {
        Declaration declaration = declarations.getFirst();
        return parse(declaration.value(), RepositorySource.Origin.CARD, declaration.mode());
    }
    List<RepositorySourceSelection> parsed = new ArrayList<>(declarations.size());
    for (Declaration declaration : declarations) {
        parsed.add(parse(declaration.value(), RepositorySource.Origin.CARD, declaration.mode()));
    }
    RepositorySourceSelection first = parsed.getFirst();
    if (first.status() == RepositorySourceSelection.Status.SELECTED
            && parsed.stream().allMatch(selection -> equivalent(first, selection))) {
        return first;
    }
    return invalid(
            "repository_source_conflict",
            "Multiple explicit repository source declarations conflict. Keep one source declaration or make every declaration identical.");
}

Why the prompt-produced code is bad

The original code is not bad because it is incorrect. It is bad because it looks like it benefits from stream short-circuiting, but it actually does not.

Main issues:

  1. It always parses every declaration.

    Even if the first declaration is invalid, or the second declaration already conflicts, the original code still parses all declarations before the stream starts.

  2. The allMatch(...) short-circuit comes too late.

    allMatch(...) can stop evaluating as soon as it knows the answer, and the Java Stream API documents it as a short-circuiting terminal operation. In the original code, the expensive parse(...) work has already happened before allMatch(...) starts. The short-circuit only saves calls to equivalent(...), not calls to parse(...).

  3. It allocates an unnecessary intermediate list.

    The method only needs the first parsed selection, then needs to know whether all remaining declarations parse to equivalent selections. Keeping every parsed value in a List<RepositorySourceSelection> is unnecessary.

  4. It compares the first item with itself.

    The stream includes first, so the first equivalent(...) call compares the first parsed result with itself. That is harmless but redundant.

  5. It mixes imperative collection building with a stream terminal operation in a misleading way.

    The stream appears to be the core of the logic, but the actual parsing work has already been performed by the loop.

Maintainer-preferred code

private RepositorySourceSelection explicitSource(Card card) {
    List<Declaration> declarations = declarations(card);

    return switch (declarations.size()) {
        case 0 -> RepositorySourceSelection.none();
        case 1 -> parseExplicitSource(declarations.getFirst());
        default -> explicitSourceFromMultipleDeclarations(declarations);
    };
}

private static RepositorySourceSelection explicitSourceFromMultipleDeclarations(List<Declaration> declarations) {
    RepositorySourceSelection first = parseExplicitSource(declarations.getFirst());

    boolean allEquivalent = first.status() == RepositorySourceSelection.Status.SELECTED
            && declarations.stream()
                    .skip(1)
                    .map(RepositorySourceResolver::parseExplicitSource)
                    .allMatch(selection -> equivalent(first, selection));

    return allEquivalent ? first : repositorySourceConflict();
}

private static RepositorySourceSelection parseExplicitSource(Declaration declaration) {
    return parse(
            declaration.value(),
            RepositorySource.Origin.CARD,
            declaration.mode());
}

private static RepositorySourceSelection repositorySourceConflict() {
    return invalid(REPOSITORY_SOURCE_CONFLICT_CODE, REPOSITORY_SOURCE_CONFLICT_GUIDANCE);
}

Why the replacement is better

The better implementation moves parsing into the stream pipeline:

declarations.stream()
        .skip(1)
        .map(RepositorySourceResolver::parseExplicitSource)
        .allMatch(selection -> equivalent(first, selection));

That is better because Java streams are lazy: source elements are consumed only as needed when the terminal operation runs. Combined with short-circuiting allMatch(...), parsing can stop as soon as a conflicting or invalid declaration is encountered.

Additional improvements:

  • The first declaration is parsed once and used as the comparison target.
  • skip(1) avoids the redundant self-comparison.
  • No intermediate parsed-selection list is allocated.
  • The switch expression makes the explicit 0/1/many branching obvious.
  • Small helpers keep the conflict branch readable without over-abstracting the rule.
  • The code keeps the readability of streams while making the stream do the useful short-circuiting work.

Desired eval behavior

Add an eval that rewards an answer that:

  • triggers the Java Streams skill during the original implementation prompt because it asks for stream-relevant declaration collection, comparison, and conflict detection logic;
  • preserves the stricter conflict rule instead of reverting to first-declaration-wins behavior;
  • keeps the explicit 0/1/many declaration branching readable;
  • avoids an eager intermediate List<RepositorySourceSelection> when only the first parsed value and all remaining equivalent checks are needed;
  • parses the first declaration once, then streams the remaining declarations with skip(1), map(...), and allMatch(...);
  • recognizes that the prompt-produced stream shape is misleading because short-circuiting happens after parsing has already completed;
  • explains that moving parseExplicitSource(...) into the stream pipeline lets lazy stream traversal and allMatch(...) avoid parsing later declarations after the result is known;
  • extracts small helpers such as parseExplicitSource(...) and repositorySourceConflict() when they clarify the branch logic;
  • preserves conflict behavior: multiple declarations are accepted only when the first parse is selected and every remaining parse is equivalent;
  • preserves fail-closed behavior for blank, malformed, or non-equivalent declarations;
  • does not replace this with a noisy manual loop just because a stream is involved;
  • does not over-abstract the pipeline or hide the conflict rule.

Anti-patterns the eval should reject

  • Waiting until a later prompt explicitly says to refactor stream usage before applying stream guidance.
  • Eagerly parsing all declarations into a temporary list solely to stream over it.
  • Claiming the eager-list implementation benefits meaningfully from allMatch(...) short-circuiting without noticing that parsing has already happened.
  • A manual loop that is more verbose than the tail-stream check without adding clarity.
  • Re-parsing the first declaration during comparison.
  • Accepting equivalent invalid declarations.
  • Letting allMatch(...) run over the first parsed item and comparing it to itself as part of the rule.
  • Collapsing the 0/1/many cases into nested if blocks when a switch expression would be clearer.

Suggested eval name

tail-stream-equivalence-check

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions