feat(query): opt-in --recency flag to down-weight stale facts#1665
Open
TPAteeq wants to merge 2 commits into
Open
feat(query): opt-in --recency flag to down-weight stale facts#1665TPAteeq wants to merge 2 commits into
TPAteeq wants to merge 2 commits into
Conversation
…fy-Labs#1650, partial) Add an opt-in `--recency` flag to `graphify query` (and `recency` / `half_life_days` fields on the MCP `query_graph` tool) that multiplies each matched node's search score by a time-decay factor, so newer facts rank ahead of otherwise-equal stale ones. Default output is byte-for-byte unchanged when the flag is off. Recency signal precedence: `captured_at` (ingested docs) -> else `source_file` mtime -> else 1.0 (neutral, so code/AST nodes are unaffected). Decay reuses the reflect sidecar's pure half-life math (`_decay`, 30-day default) without coupling to the learning sidecar. `_score_nodes` / `_query_graph_text` accept an optional explicit `now` anchor so tests inject ages instead of depending on the wall clock; the decay is applied only inside `if recency:` so the flag-off path is identical to the pre-change scorer. This is the small slice of Graphify-Labs#1650; fact supersession / temporal-validity invalidation is deferred to a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ry (Graphify-Labs#1650) Follow-up to review of the opt-in `--recency` query weighting: - CHANGELOG: add the required `## Unreleased` Feat bullet for the flag. - serve.py: guard the MCP `half_life_days` parse. A non-numeric MCP argument previously hit an unguarded `float(...)` and crashed the `query_graph` handler with `ValueError`, whereas the CLI degrades gracefully. Extracted `_recency_args` coerces the payload and falls back to the 30-day default on a bad value, so the two entry points are consistent (and the parse is unit- testable without the `mcp` package installed). - serve.py: normalize a trailing `Z` on `captured_at` to `+00:00` in the recency path before decay. `datetime.fromisoformat` only accepts bare `Z` on Python >= 3.11, so external frontmatter written as `...Z` silently degraded to neutral weight on 3.10. reflect._parse_dt is left untouched (the reflect Q&A path keeps its semantics). - tests: lock malformed/null/non-string `captured_at` -> neutral 1.0; `half_life_days <= 0` -> recency disabled with no div-by-zero; `_recency_args` threading + bad-value fallback (no crash); and the `Z` normalization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an opt-in
--recencyflag tographify query(and the equivalentrecency/half_life_daysfields on the MCPquery_graphtool) thatmultiplies each matched node's search score by a time-decay factor, so newer
facts rank ahead of otherwise-equal stale ones. Default query output is
byte-for-byte unchanged when the flag is off.
This is the small, low-risk slice of #1650. The larger, riskier piece — fact
supersession / temporal-validity invalidation (
valid_from/valid_to,marking an old fact as superseded by a newer one) — is deferred to a
follow-up and is intentionally not implemented here.
How
graphify reflectalready has a pure_decay()(halves every_DEFAULT_HALF_LIFE_DAYS = 30);serve.pynowimports those two pure names and decays query scores on the same curve. No
coupling to the reflect learning sidecar.
_node_recency_weight):captured_at(ISOdatetime, present only on ingested docs) → else the
source_file's on-diskmtime resolved under the repo root → else
1.0(neutral). Code/AST nodescarry neither, so recency is a no-op for them.
_query_graph_text(..., recency, half_life_days, now, source_root)→_score_nodes(...). The decay multiplier is applied onlyinside
if recency:; when the flag is off, every code path is identical tothe pre-Feature: temporal validity on facts + recency weighting in query (living corpora return stale facts at full weight) #1650 scorer (verified by byte-identical tests).
exactly why this stays behind an opt-in flag.
_score_nodes/_query_graph_textaccept an optional explicitnowanchor so tests injectages instead of depending on the wall clock.
_pick_seeds' per-term coverage guarantee is left age-neutral on purpose: asingle old match for a query term shouldn't be starved out just for being old.
CLI / MCP surface
MCP
query_graphgainsrecency(boolean, default false) andhalf_life_days(number, default 30, only used whenrecency=true).Tests
tests/test_serve.pyandtests/test_query_cli.py:_score_nodes/_query_graph_textoutput byte-identical totoday, even when
captured_atis present;and the
Start:seed order shift toward newer);captured_atprecedence, on-disk mtime fallback, neutral weight when neitheris present, future-date clamp, and
_source_root_forpath derivation;--recencyflip and--half-life-daysparsing/validation.Ages are injected via
captured_at+ an explicitnowanchor (unit tests) ordecades-apart dates (CLI tests, which have no now-injection), so nothing is
wall-clock dependent.
Run:
Result: 165 passed, 1 skipped.
Deferred (follow-up, not in this PR)
valid_from/valid_to,superseded-by links). Only the query-time recency weighting is implemented here.
Refs #1650