Fix eval set for GPT 53 upgrade by jainharsheet77 · Pull Request #509 · microsoft/CopilotStudioSamples

jainharsheet77 · 2026-05-08T21:49:27Z

Why prompt changes are needed between model upgrades

Eval prompts are not model-agnostic. Each model generation interprets intent,
phrasing, and ambiguity differently — a query that resolves cleanly on one model can
become ambiguous on the next.

In this case, GPT-5.3 was reading the yes/no framing ("Can I…?") as a
permissions question rather than the intended info + update-process intent.
The reworded query makes both sub-intents explicit.

🔑 When we upgrade the underlying model, eval sets must be re-tuned so the
scorecard measures agent quality, not phrasing artifacts the new model handles
differently. Otherwise regressions reflect prompt drift, not real behavior changes.

Test plan

Re-run ESS eval suite against GPT-5.3
Confirm the reworded query scores ≥ threshold on CompareMeaning
Spot-check no regression on prior model baseline

jainharsheet77 · 2026-05-08T21:49:57Z

@microsoft-github-policy-service agree company="Microsoft"

Fix workday related ambiguious queries

e1eb27f

jainharsheet77 requested a review from a team as a code owner May 8, 2026 21:49

nkemms approved these changes May 8, 2026

View reviewed changes

jainharsheet77 merged commit 9a3cd8f into microsoft:main May 11, 2026
1 check passed

jainharsheet77 deleted the u/jainharsheet77/fix-evalset-for-gpt53 branch May 11, 2026 07:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix eval set for GPT 53 upgrade#509

Fix eval set for GPT 53 upgrade#509
jainharsheet77 merged 1 commit intomicrosoft:mainfrom
jainharsheet77:u/jainharsheet77/fix-evalset-for-gpt53

jainharsheet77 commented May 8, 2026

Uh oh!

jainharsheet77 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jainharsheet77 commented May 8, 2026

Why prompt changes are needed between model upgrades

Test plan

Uh oh!

jainharsheet77 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants