Skip to content

Fix eval set for GPT 53 upgrade#509

Merged
jainharsheet77 merged 1 commit intomicrosoft:mainfrom
jainharsheet77:u/jainharsheet77/fix-evalset-for-gpt53
May 11, 2026
Merged

Fix eval set for GPT 53 upgrade#509
jainharsheet77 merged 1 commit intomicrosoft:mainfrom
jainharsheet77:u/jainharsheet77/fix-evalset-for-gpt53

Conversation

@jainharsheet77
Copy link
Copy Markdown
Contributor

Why prompt changes are needed between model upgrades

Eval prompts are not model-agnostic. Each model generation interprets intent,
phrasing, and ambiguity differently — a query that resolves cleanly on one model can
become ambiguous on the next.

In this case, GPT-5.3 was reading the yes/no framing ("Can I…?") as a
permissions question rather than the intended info + update-process intent.
The reworded query makes both sub-intents explicit.

🔑 When we upgrade the underlying model, eval sets must be re-tuned so the
scorecard measures agent quality, not phrasing artifacts the new model handles
differently. Otherwise regressions reflect prompt drift, not real behavior changes.


Test plan

  • Re-run ESS eval suite against GPT-5.3
  • Confirm the reworded query scores ≥ threshold on CompareMeaning
  • Spot-check no regression on prior model baseline

@jainharsheet77 jainharsheet77 requested a review from a team as a code owner May 8, 2026 21:49
@jainharsheet77
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree company="Microsoft"

@jainharsheet77 jainharsheet77 merged commit 9a3cd8f into microsoft:main May 11, 2026
1 check passed
@jainharsheet77 jainharsheet77 deleted the u/jainharsheet77/fix-evalset-for-gpt53 branch May 11, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants