Skip to content

Upsert keys in .env for benchmarks auth/init instead of appending#1037

Merged
dolaameng merged 6 commits into
mainfrom
agent/limagoog-20260529183140-af82f3a6
May 29, 2026
Merged

Upsert keys in .env for benchmarks auth/init instead of appending#1037
dolaameng merged 6 commits into
mainfrom
agent/limagoog-20260529183140-af82f3a6

Conversation

@kaggle-agent
Copy link
Copy Markdown
Collaborator

@kaggle-agent kaggle-agent commented May 29, 2026

kaggle b auth and kaggle b init opened .env in append mode, so
every rerun stacked a fresh copy of MODEL_PROXY_* (and LLM_*)
onto the bottom of the file. Two problems: the user's .env grew
without bound, and any value the user had hand-edited got silently
overridden by the appended copy below it (python-dotenv reads the
last occurrence). Switch to dotenv.set_key(..., quote_mode="never")
so each managed key is refreshed in place, preserving comments, blank
lines, and unrelated keys.

Migration note: users with stacked duplicates from prior runs keep
those lines, but set_key rewrites every occurrence to the fresh
value, so no stale token lingers in the file after the first rerun
on this code.


Task: limagoog-20260529183140-af82f3a6
Context: https://chat.kaggle.net/kaggle/pl/46qfted3gbyqdkcpt447cusich

`kaggle b auth` and `kaggle b init` opened `.env` in append mode, so
each rerun stacked duplicate MODEL_PROXY_* (and LLM_*) lines. Since
python-dotenv reads the first occurrence, the oldest expired token
won — and any hand-edited values were silently shadowed. Replace the
append with an in-place upsert that preserves comments, blank lines,
and unrelated keys, so reruns are idempotent.

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
Comment thread src/kaggle/api/kaggle_api_extended.py
`hatch run lint:fmt` collapsed a couple of adjacent-string blocks in the
new upsert tests onto single lines. Formatting-only — no behavior
change.

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
Comment thread src/kaggle/api/kaggle_api_extended.py Outdated
_ENV_LINE_RE = re.compile(r"^[ \t]*(?:export[ \t]+)?([A-Za-z_][A-Za-z0-9_]*)[ \t]*=")

@classmethod
def _upsert_env_file(cls, env_file_abs, env_vars):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use existing library like python-dotenv instead of doing it from scratch?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to python-dotenv. Added python-dotenv to dependencies, removed the bespoke _upsert_env_file helper + _ENV_LINE_RE, and replaced the write block with a small loop calling dotenv.set_key(env_file_abs, key, value, quote_mode="never"). quote_mode="never" preserves the existing unquoted output format (so the existing auth/init assertions like MODEL_PROXY_URL=https://... still hold).

assert "LLM_DEFAULT=google/gemini-3-flash-preview\n" in content


# ============================================================
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are too much new test code. We want good coverage but less redundancy. e.g., test_writes_to_empty_file, test_rerunning_auth_replaces_existing_keys``test_preserves_user_added_keys_and_comments are all redundant. Specially if we decide to adopt python-dotenv, which already provides tested expected behavior. Can simplify these?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimmed the new tests significantly now that python-dotenv owns the line-by-line semantics. Deleted the whole TestUpsertEnvFile class (8 tests) and the redundant test_rerunning_auth_replaces_existing_keys integration test. Kept only 3 integration tests under TestBenchmarksUpsert that verify the user-facing contract through the actual CLI: stale token gets replaced on rerun, user-added keys + comments survive, and init reruns are idempotent across all six managed keys. Net diff is now ~55 lines of test code instead of ~210.

Per review, replace the bespoke `_upsert_env_file` helper with
`dotenv.set_key(..., quote_mode="never")`, which already implements the
exact upsert semantics we want. Drop the standalone helper tests now
that line-by-line behavior is owned upstream, keeping just three
integration tests through the CLI: refresh replaces stale token, user
edits survive, and `init` reruns are idempotent.

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
# ============================================================


class TestBenchmarksUpsert:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a test for the migration-from-duplicates scenario that's the main user-facing promise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_first_rerun_after_upgrade_refreshes_stale_duplicates. It seeds a .env with three stacked MODEL_PROXY_API_KEY= lines (mimicking what the old append-based CLI would have left behind), runs kaggle b auth once, and asserts (a) no stale value survives, (b) dotenv_values() — which is what consumers use — resolves to the fresh token, and (c) unrelated keys are untouched. Worth noting: dotenv.set_key refreshes every duplicate occurrence, not just the first, so the migration story is stronger than I'd written in the PR description — the live value is correct after the first rerun even though the duplicate lines themselves remain.

Per review, cover the main user-facing promise: users who ran the old
append-based CLI have stacked `MODEL_PROXY_API_KEY=` lines in their
`.env`. New test seeds that state, runs auth once, and verifies the
fresh token is what `dotenv_values()` resolves to — so the live value
is correct on first rerun even though duplicate lines remain.

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@dolaameng dolaameng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@nl917 nl917 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

Comment thread src/kaggle/api/kaggle_api_extended.py Outdated
f.write("\n")
for key, value in env_vars.items():
f.write(f"{key}={value}\n")
# Upsert via python-dotenv so reruns refresh in place rather than
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there comment still accurate? Try to keep it concise

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimmed the comment from 3 lines to 2. Kept only the non-obvious bit (why upsert vs append: python-dotenv picks the first occurrence, so duplicates would let an expired token shadow the fresh one) and dropped the redundant phrasing.

Per review, drop the redundant phrasing and keep only the non-obvious
bit: why upsert vs append.

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
Comment thread src/kaggle/api/kaggle_api_extended.py Outdated
f.write("\n")
for key, value in env_vars.items():
f.write(f"{key}={value}\n")
# Upsert (not append): python-dotenv reads the first occurrence, so
Copy link
Copy Markdown
Contributor

@dolaameng dolaameng May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the comment is correct. The purpose of this PR is to keep the .env clean on user side, and also avoid conflicts with user manual editing. not because the new keys shadowing the old ones, because python-dotenv actually reads the last one for duplicates.

Fix this comment (or just say what it does briefly without these details), and also update your PR summary.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — python-dotenv reads the last occurrence, not the first (I verified: dotenv_values on FOO=first\nFOO=middle\nFOO=last\n returns last). So the actual motivation is file hygiene (no stacked duplicates on the user side) and not silently shadowing any value the user has hand-edited. Updated the inline comment to say exactly that, and corrected the test docstring on test_first_rerun_after_upgrade_refreshes_stale_duplicates (its assertions were already correct — set_key refreshes every occurrence, so no stale value lingers — but the prose had the same wrong framing). I'll update the PR summary in the next regeneration step.

I had python-dotenv's duplicate-key semantics backwards. It reads the
last occurrence, so the real motivation is keeping the user's .env
clean and not silently shadowing hand-edited values — not "the oldest
expired token wins". Rewrote the inline comment and the migration-test
docstring to match.

Co-authored-by: kaggle-agent <kaggle-agent@users.noreply.github.com>
@dolaameng dolaameng merged commit 4fc2429 into main May 29, 2026
14 checks passed
@dolaameng dolaameng deleted the agent/limagoog-20260529183140-af82f3a6 branch May 29, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants