Graphify-Labs · TPAteeq · Jul 4, 2026 · Jul 4, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,7 @@ Full release notes with details on each version: [GitHub Releases](https://githu
 
 ## Unreleased
 
+- Feat: opt-in `--recency` query flag down-weights stale facts so newer answers rank ahead of otherwise-equal old ones (#1650, thanks @Ns2384-star). `graphify query --recency` (and the matching `recency` / `half_life_days` fields on the MCP `query_graph` tool) multiplies each matched node's search score by a half-life time-decay keyed on the node's `captured_at`, falling back to its source file's mtime — code/AST nodes carry neither signal, so they stay neutral (weight 1.0). The half-life defaults to 30 days and is tunable via `--half-life-days`; the decay reuses `graphify reflect`'s pure half-life math. Recency is strictly opt-in: default output is byte-for-byte unchanged when the flag is off. This is the low-effort slice of #1650 — full temporal-validity / fact supersession is a deferred follow-up.
 - Fix: a malformed semantic chunk no longer crashes `extract` and discards every successful chunk (#1631, thanks @ssazy). When an LLM returned a well-formed object whose `edges` (or `nodes`/`hyperedges`) array carried a stray non-dict entry — a nested list where an edge object belongs — the AST+semantic merge and the semantic-cache write both called `.get()` per entry and raised `AttributeError: 'list' object has no attribute 'get'`. On a 34-chunk run where 33 succeeded, that meant no `graph.json` was written and the cache write failed too, so a re-run re-extracted everything. `_parse_llm_json` now sanitizes each fragment at the single parse chokepoint (keeping only dict entries and coercing a non-list value to `[]`), so the cache writer, the adaptive-retry merge, and the CLI merge are all protected in one place.
 - Fix: an unresolved bare npm import no longer aliases onto an unrelated same-named local file (#1638, thanks @EveX1). `import colors from "tailwindcss/colors"` in a `.tsx` file emitted an `imports_from` edge to the bare id `colors`, and build.py's pre-migration alias index (which registers every local file's bare stem) then remapped it onto an unrelated `backend/utils/colors.py` — a confident (`EXTRACTED`) cross-language phantom edge, and one per `.tsx` file sharing the import. In a real monorepo eight unrelated `.tsx` files all landed on a single Python module. Common package subpaths (`colors`, `utils`, `types`, `config`, `client`) collide this way constantly. The external-import fallback now namespaces its target with the `ref` prefix (the same J-4 convention used for tsconfig `extends`/`$ref` externals), so it can never collapse to a local file/symbol id; the ref-namespaced target has no node, so build drops it as an external reference — the correct outcome for a third-party import.
 - Fix: `graph.json` node/edge ordering is now stable run-to-run for document/semantic corpora (#1632, thanks @umeshpsatwe). With a parallel LLM backend, `extract_corpus_parallel` merged chunk results in completion order, so which network call happened to return first reordered the nodes and edges even when the model returned identical content — churning `graph.json` between otherwise-identical runs. Chunks are now merged in deterministic submission order after the pool drains (matching the serial path); the progress callback still fires in completion order so long local runs aren't silent. Note: the semantic content the LLM extracts is itself nondeterministic run-to-run — this fix removes the pipeline's own ordering churn, not the model's variance.

diff --git a/graphify/__main__.py b/graphify/__main__.py
@@ -2842,22 +2842,38 @@ def main() -> None:
             sys.exit(1)
     elif cmd == "query":
         if len(sys.argv) < 3:
-            print("Usage: graphify query \"<question>\" [--dfs] [--context C] [--budget N] [--graph path]", file=sys.stderr)
+            print("Usage: graphify query \"<question>\" [--dfs] [--context C] [--budget N] [--recency] [--half-life-days N] [--graph path]", file=sys.stderr)
             sys.exit(1)
-        from graphify.serve import _query_graph_text
+        from graphify.serve import _query_graph_text, _source_root_for
         from graphify.security import sanitize_label
         from networkx.readwrite import json_graph
         from graphify import querylog
 
         question = sys.argv[2]
         use_dfs = "--dfs" in sys.argv
+        use_recency = "--recency" in sys.argv
+        half_life_days = 30.0
         budget = 2000
         graph_path = _default_graph_path()
         context_filters: list[str] = []
         args = sys.argv[3:]
         i = 0
         while i < len(args):
-            if args[i] == "--budget" and i + 1 < len(args):
+            if args[i] == "--half-life-days" and i + 1 < len(args):
+                try:
+                    half_life_days = float(args[i + 1])
+                except ValueError:
+                    print("error: --half-life-days must be a number", file=sys.stderr)
+                    sys.exit(1)
+                i += 2
+            elif args[i].startswith("--half-life-days="):
+                try:
+                    half_life_days = float(args[i].split("=", 1)[1])
+                except ValueError:
+                    print("error: --half-life-days must be a number", file=sys.stderr)
+                    sys.exit(1)
+                i += 1
+            elif args[i] == "--budget" and i + 1 < len(args):
                 try:
                     budget = int(args[i + 1])
                 except ValueError:
@@ -2925,6 +2941,9 @@ def main() -> None:
             depth=2,
             token_budget=budget,
             context_filters=context_filters,
+            recency=use_recency,
+            half_life_days=half_life_days,
+            source_root=_source_root_for(gp),
         )
         querylog.log_query(
             kind="query",

diff --git a/graphify/serve.py b/graphify/serve.py
@@ -5,12 +5,17 @@
 import re
 import sys
 from array import array
+from datetime import datetime, timezone
 from pathlib import Path
 import networkx as nx
 from networkx.readwrite import json_graph
 from graphify.security import sanitize_label, check_graph_file_size_cap
 from graphify.build import edge_data
 from graphify.paths import default_graph_json as _default_graph_json
+from graphify.paths import GRAPHIFY_OUT_NAME as _GRAPHIFY_OUT_NAME
+# Reuse the reflect sidecar's pure half-life math (no sidecar coupling) so the
+# opt-in --recency query weighting decays on the same curve as `graphify reflect`.
+from graphify.reflect import _DEFAULT_HALF_LIFE_DAYS, _decay
 
 try:
     import jieba as _jieba  # type: ignore[import-untyped]
@@ -283,8 +288,84 @@ def _trigram_candidates(G: nx.Graph, needles: list[str], *, guard_frac: float =
     return [ids[i] for i in sorted(cand)]
 
 
-def _score_nodes(G: nx.Graph, terms: list[str]) -> list[tuple[float, str]]:
+def _source_root_for(graph_path: "str | Path | None") -> "Path | None":
+    """Repo root that node ``source_file`` paths are relative to, for mtime lookup.
+
+    Graphs live at ``<root>/graphify-out/graph.json``, so the root is two levels
+    up when the parent dir is the output dir, else the graph's own directory.
+    Returns None for an unknown path (recency then falls back to captured_at only).
+    """
+    if not graph_path:
+        return None
+    p = Path(graph_path)
+    if p.parent.name == _GRAPHIFY_OUT_NAME:
+        return p.parent.parent
+    return p.parent
+
+
+def _node_recency_weight(
+    data: dict,
+    now: datetime,
+    half_life_days: float,
+    source_root: "Path | None",
+) -> float:
+    """Time-decay multiplier in (0, 1] for a node — newest ~= 1.0.
+
+    Signal precedence: ``captured_at`` (ISO datetime, present only on ingested
+    docs) first; else the ``source_file``'s mtime resolved under ``source_root``;
+    else 1.0 (neutral). Code/AST nodes carry neither, so recency is a no-op for
+    them. Decay uses the same half-life curve as ``graphify reflect`` (_decay).
+    """
+    captured = data.get("captured_at")
+    if captured:
+        s = str(captured)
+        # datetime.fromisoformat (via reflect._parse_dt) only learned to accept a
+        # trailing 'Z' on Python >= 3.11; normalize it here so external frontmatter
+        # written as '...Z' still decays on 3.10. reflect._parse_dt itself is left
+        # untouched, so the reflect Q&A path keeps its existing semantics.
+        if s.endswith("Z"):
+            s = s[:-1] + "+00:00"
+        return _decay(s, now, half_life_days)
+    if source_root is not None:
+        sf = data.get("source_file")
+        if sf:
+            try:
+                mtime = (source_root / str(sf)).stat().st_mtime
+            except (OSError, ValueError):
+                return 1.0
+            dt = datetime.fromtimestamp(mtime, tz=timezone.utc)
+            return _decay(dt.isoformat(), now, half_life_days)
+    return 1.0
+
+
+def _recency_args(arguments: dict) -> "tuple[bool, float]":
+    """Coerce the opt-in recency knobs from an MCP ``query_graph`` payload.
+
+    Mirrors the CLI's leniency: a missing or non-numeric ``half_life_days`` falls
+    back to the default instead of raising, so a malformed MCP argument can't crash
+    the tool handler (the CLI reports the error and exits; the MCP path defaults).
+    """
+    recency = bool(arguments.get("recency", False))
+    try:
+        half_life_days = float(arguments.get("half_life_days", _DEFAULT_HALF_LIFE_DAYS))
+    except (TypeError, ValueError):
+        half_life_days = _DEFAULT_HALF_LIFE_DAYS
+    return recency, half_life_days
+
+
+def _score_nodes(
+    G: nx.Graph,
+    terms: list[str],
+    *,
+    recency: bool = False,
+    half_life_days: float = _DEFAULT_HALF_LIFE_DAYS,
+    now: "datetime | None" = None,
+    source_root: "Path | None" = None,
+) -> list[tuple[float, str]]:
     scored = []
+    # Recency is strictly opt-in: when off, every code path below is byte-for-byte
+    # identical to the pre-#1650 scorer (no age lookup, no decay).
+    recency_now = (now or datetime.now(timezone.utc)) if recency else None
     norm_terms = [tok for t in terms for tok in _search_tokens(t)]
     idf = _compute_idf(G, norm_terms)
     # Whole-query string for full-label matching (mirrors _find_node's `term`).
@@ -341,6 +422,8 @@ def _score_nodes(G: nx.Graph, terms: list[str]) -> list[tuple[float, str]]:
             if t in source:
                 score += _SOURCE_MATCH_BONUS * w
         if score > 0:
+            if recency_now is not None:
+                score *= _node_recency_weight(data, recency_now, half_life_days, source_root)
             scored.append((score, nid))
     # Sort by score desc; break ties toward the shorter label so a concise exact
     # match beats a longer superset that happens to share the same score.
@@ -634,9 +717,22 @@ def _query_graph_text(
     depth: int = 3,
     token_budget: int = 2000,
     context_filters: list[str] | None = None,
+    recency: bool = False,
+    half_life_days: float = _DEFAULT_HALF_LIFE_DAYS,
+    now: "datetime | None" = None,
+    source_root: "Path | None" = None,
 ) -> str:
     terms = _query_terms(question)
-    scored = _score_nodes(G, terms)
+    scored = _score_nodes(
+        G,
+        terms,
+        recency=recency,
+        half_life_days=half_life_days,
+        now=now,
+        source_root=source_root,
+    )
+    # _pick_seeds' per-term coverage guarantee stays age-neutral on purpose: an
+    # old but sole match for a query term shouldn't be starved out just for age.
     start_nodes = _pick_seeds(scored, G=G, terms=terms)
     if not start_nodes:
         return "No matching nodes found."
@@ -863,6 +959,16 @@ async def list_tools() -> list[types.Tool]:
                             "items": {"type": "string"},
                             "description": "Optional explicit edge-context filter, e.g. ['call', 'field']",
                         },
+                        "recency": {
+                            "type": "boolean",
+                            "default": False,
+                            "description": "Opt-in: down-weight stale facts by age (captured_at, else source-file mtime). Off by default; leaves ranking unchanged when false.",
+                        },
+                        "half_life_days": {
+                            "type": "number",
+                            "default": 30,
+                            "description": "Recency half-life in days (a fact's weight halves every N days). Only used when recency=true.",
+                        },
                     },
                     "required": ["question"],
                 },
@@ -990,6 +1096,7 @@ def _tool_query_graph(arguments: dict) -> str:
         depth = min(int(arguments.get("depth", 3)), 6)
         budget = int(arguments.get("token_budget", 2000))
         context_filter = arguments.get("context_filter")
+        recency, half_life_days = _recency_args(arguments)
         _t0 = _time.perf_counter()
         result = _query_graph_text(
             G,
@@ -998,6 +1105,9 @@ def _tool_query_graph(arguments: dict) -> str:
             depth=depth,
             token_budget=budget,
             context_filters=context_filter,
+            recency=recency,
+            half_life_days=half_life_days,
+            source_root=_source_root_for(active_graph_path),
         )
         querylog.log_query(
             kind="mcp_query",

diff --git a/tests/test_query_cli.py b/tests/test_query_cli.py
@@ -51,6 +51,79 @@ def test_query_cli_heuristic_context_filter(monkeypatch, tmp_path, capsys):
     assert "build" not in out
 
 
+def _write_recency_graph(tmp_path):
+    """Two equal-length 'widget' matches differing only in age.
+
+    captured_at values are decades apart (2000 vs 2999), so recency ranking is
+    stable for any real wall-clock `now` — the CLI has no now-injection, so the
+    test must not depend on the exact current date. The far-past node keeps the
+    alphabetically-smaller id ('a_old') so the recency-off node-id tie-break puts
+    it first, making the recency-on flip to 'z_new' unambiguous.
+    """
+    G = nx.Graph()
+    G.add_node("a_old", label="widget aaa", source_file="a.py", source_location="L1",
+               community=0, captured_at="2000-01-01T00:00:00+00:00")
+    G.add_node("z_new", label="widget bbb", source_file="b.py", source_location="L1",
+               community=0, captured_at="2999-01-01T00:00:00+00:00")
+    G.add_edge("a_old", "z_new", relation="calls", confidence="EXTRACTED", context="call")
+    graph_path = tmp_path / "graph.json"
+    graph_path.write_text(json.dumps(json_graph.node_link_data(G, edges="links")))
+    return graph_path
+
+
+def _run_query(monkeypatch, capsys, argv):
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(mainmod.sys, "argv", argv)
+    mainmod.main()
+    return capsys.readouterr().out
+
+
+def test_query_cli_recency_off_by_default(monkeypatch, tmp_path, capsys):
+    """Without --recency the age is ignored: older node seeds first (node-id order)."""
+    graph_path = _write_recency_graph(tmp_path)
+    out = _run_query(
+        monkeypatch, capsys,
+        ["graphify", "query", "widget", "--graph", str(graph_path)],
+    )
+    header = out.splitlines()[0]
+    assert header.index("widget aaa") < header.index("widget bbb")
+
+
+def test_query_cli_recency_flag_shifts_to_newer(monkeypatch, tmp_path, capsys):
+    """--recency promotes the newer node ahead of an equally-matching older one."""
+    graph_path = _write_recency_graph(tmp_path)
+    out = _run_query(
+        monkeypatch, capsys,
+        ["graphify", "query", "widget", "--recency", "--graph", str(graph_path)],
+    )
+    header = out.splitlines()[0]
+    assert header.index("widget bbb") < header.index("widget aaa")
+
+
+def test_query_cli_half_life_days_parsed(monkeypatch, tmp_path, capsys):
+    """--half-life-days is accepted alongside --recency (and doesn't crash)."""
+    graph_path = _write_recency_graph(tmp_path)
+    out = _run_query(
+        monkeypatch, capsys,
+        ["graphify", "query", "widget", "--recency", "--half-life-days", "7", "--graph", str(graph_path)],
+    )
+    header = out.splitlines()[0]
+    assert header.index("widget bbb") < header.index("widget aaa")
+
+
+def test_query_cli_half_life_days_rejects_non_number(monkeypatch, tmp_path, capsys):
+    import pytest
+    graph_path = _write_recency_graph(tmp_path)
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(
+        mainmod.sys, "argv",
+        ["graphify", "query", "widget", "--half-life-days", "soon", "--graph", str(graph_path)],
+    )
+    with pytest.raises(SystemExit):
+        mainmod.main()
+    assert "--half-life-days must be a number" in capsys.readouterr().err
+
+
 def test_query_cli_rejects_oversized_graph(monkeypatch, tmp_path, capsys):
     """#F4: query CLI must refuse to parse a graph.json that exceeds the cap."""
     import pytest