Fix eval set for GPT 53 upgrade by harsheetjain · Pull Request #507 · microsoft/CopilotStudioSamples

harsheetjain · 2026-05-08T06:11:16Z

Why prompt changes are needed between model upgrades

Eval prompts are not model-agnostic. Each model generation interprets intent,
phrasing, and ambiguity differently — a query that resolves cleanly on one model can
become ambiguous on the next.

In this case, GPT-5.3 was reading the yes/no framing ("Can I…?") as a
permissions question rather than the intended info + update-process intent.
The reworded query makes both sub-intents explicit.

🔑 When we upgrade the underlying model, eval sets must be re-tuned so the
scorecard measures agent quality, not phrasing artifacts the new model handles
differently. Otherwise regressions reflect prompt drift, not real behavior changes.

Test plan

Re-run ESS eval suite against GPT-5.3
Confirm the reworded query scores ≥ threshold on CompareMeaning
Spot-check no regression on prior model baseline

harsheetjain · 2026-05-08T06:14:37Z

@microsoft-github-policy-service agree company="Microsoft"

nkemms

👍

Fix workday related ambiguious queries

772c30a

harsheetjain requested a review from a team as a code owner May 8, 2026 06:11

nkemms approved these changes May 8, 2026

View reviewed changes

harsheetjain closed this May 8, 2026

harsheetjain reopened this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix eval set for GPT 53 upgrade#507

Fix eval set for GPT 53 upgrade#507
harsheetjain wants to merge 1 commit intomicrosoft:mainfrom
harsheetjain:u/harsheetjain/fix-evalset-for-gpt53

harsheetjain commented May 8, 2026 •

edited

Loading

Uh oh!

harsheetjain commented May 8, 2026

Uh oh!

nkemms left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harsheetjain commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why prompt changes are needed between model upgrades

Test plan

Uh oh!

harsheetjain commented May 8, 2026

Uh oh!

nkemms left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harsheetjain commented May 8, 2026 •

edited

Loading