Skip to content

[FEATURE] Evals: thin scorer / experiment-item result surface for external evidence consumers #15206

@Rul1an

Description

@Rul1an

We just shipped a small sample on the Assay side that consumes a frozen Mastra scorer / experiment-item result as external evidence:

https://github.com/Rul1an/assay/tree/main/examples/mastra-scorer-evidence

The reason for the sample is pretty narrow: we wanted to test the smallest honest Mastra reliability surface an external evidence consumer could ingest without collapsing back into traces, Studio metrics, or dashboard semantics.

So we kept the shape intentionally small:

  • scorer name
  • score
  • outcome
  • dataset version ref
  • item ref
  • target type
  • timestamp

We are not treating that artifact as Mastra truth, and we are not assuming the checked-in fixture shape is a stable wire contract.

The question is mainly about seam choice:

If an external evidence consumer wants the smallest honest Mastra reliability surface, is a bounded scorer / experiment-item result roughly the right place to start, or is there a thinner scorer result surface you would rather point them at?

If we are aiming at the wrong layer, happy to adjust the sample.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EvalsIssues surrounding Mastra EvalsQuestionNot a bug, not a feature, just a question about how something does or should work.effort:mediumenhancementNew feature or requestimpact:mediumtrio-tracery

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions