Skip to content

Commit 1be3fba

Browse files
rdhyeeclaude
andauthored
Add cross-filtering to Explorer facet counts (#94)
* Add cross-filtering to Explorer facet counts When any filter is active, facet counts now reflect the intersection of all OTHER active filters. For example, selecting SESAR as source updates material/context/specimen counts to show only what exists in SESAR data. Uses parallel GROUP BY queries via DuckDB-WASM. Counts update via DOM manipulation to avoid resetting checkbox selections. Zero-count facet values are dimmed for visual clarity. When no filters are active, pre-computed summaries are used (instant). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix cross-filtering: use pre-computed cache + correct column mapping - Add 6KB pre-computed cross-filter cache for instant single-filter lookups - Add 21MB sample_facets view with URI-string columns for on-the-fly fallback - Fix column name mismatch: wide parquet has p__* BIGINT[] columns, but facet values are URI strings — cross-filter now queries sample_facets - Main whereClause uses pid subquery against sample_facets for facet filters - Source filter still queries wide parquet directly (n column is correct) Supplementary files on data.isamples.org: - isamples_202601_facet_cross_filter.parquet (6 KB, 526 rows) - isamples_202601_sample_facets_v2.parquet (21 MB, 6M rows) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix three cross-filter bugs 1. Multi-value within single facet: fast path now requires exactly one value in the active facet, not just one active dimension. Multiple selections (e.g., SESAR+GEOME) correctly fall through to on-the-fly queries. 2. Text search participates in cross-filtering: buildCrossFilterWhere now includes ILIKE conditions. sample_facets_v2 regenerated with label, description, place_name columns (63 MB on R2). 3. Clearing filters restores baseline counts: the update cell now resets all facet-count labels to baseline values and removes zero-count dimming when crossFilteredFacets is null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix count universe inconsistency and blank-value mismatch Codex review found two bugs: 1. facet_summaries counted all 6.68M records but sample_facets only had the 5.98M with coordinates — counts jumped when toggling filters. Regenerated all three parquet files from the same base universe (lat IS NOT NULL). SESAR now consistently 4,389,231 across all files. 2. Baseline summaries included blank-string facet values, but on-the-fly queries excluded them with != ''. Regenerated summaries now exclude blanks, matching the on-the-fly behavior. Also: removed dead getDisplayCounts(), fixed stale 0.3MB comment, added missing quote escaping on source cache lookup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add cross-filtering interaction tests 5 new tests in TestExplorerCrossFiltering: - Baseline SESAR count matches summaries (>4M) - Clicking source updates material counts (organicmaterial decreases) - Clearing filter restores baseline counts - Zero-count items get dimmed (opacity < 1) - New parquet endpoints (cross_filter, sample_facets_v2) reachable Cross-filter tests gracefully skip if data attributes not yet deployed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Clean up blank-string facet values in sample_facets Convert blank strings to NULL with NULLIF in sample_facets_v2 generation (586 blank context rows → NULL). Remove redundant != '' guards from on-the-fly queries since IS NOT NULL now handles both. Addresses Codex finding #2: blank values in sample_facets caused state mismatch with baseline summaries (which correctly excluded blanks). Finding #1 (count universe mismatch) was a false positive — Codex cached stale files; live CDN has consistent counts across all three artifacts (SESAR=4,389,231, total=5,980,282). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3300b44 commit 1be3fba

2 files changed

Lines changed: 323 additions & 18 deletions

File tree

tests/test_explorer.py

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,109 @@ def test_four_sources_present(self, explorer_page):
7171
assert explorer_page.get_by_text(source).count() > 0, f"Missing source: {source}"
7272

7373

74+
class TestExplorerCrossFiltering:
75+
"""Cross-filtering: clicking a facet should update counts in other facets."""
76+
77+
def _wait_for_facets(self, page):
78+
"""Wait for facet count labels to render (requires cross-filter PR)."""
79+
facet = page.locator(".facet-count[data-facet='source']")
80+
# These data attributes only exist after the cross-filtering code is deployed
81+
try:
82+
facet.first.wait_for(state="attached", timeout=30000)
83+
except Exception:
84+
pytest.skip("Cross-filter data attributes not yet deployed")
85+
86+
def _get_count(self, page, facet, value):
87+
"""Extract the numeric count from a facet-count label."""
88+
el = page.locator(f".facet-count[data-facet='{facet}'][data-value='{value}']")
89+
if el.count() == 0:
90+
return None
91+
text = el.first.text_content() # e.g. "(4,389,231)"
92+
return int(text.strip("() ").replace(",", ""))
93+
94+
def _click_checkbox(self, page, label):
95+
"""Click a checkbox by its visible label text."""
96+
page.get_by_text(label, exact=True).first.click()
97+
98+
def test_baseline_sesar_count_matches_summaries(self, explorer_page):
99+
"""Before any interaction, SESAR count should match the facet summary."""
100+
self._wait_for_facets(explorer_page)
101+
count = self._get_count(explorer_page, "source", "SESAR")
102+
assert count is not None, "SESAR facet-count element not found"
103+
assert count > 4_000_000, f"SESAR baseline count too low: {count}"
104+
105+
def test_clicking_source_updates_material_counts(self, explorer_page):
106+
"""Checking SESAR should lower material counts (no archaeology materials)."""
107+
self._wait_for_facets(explorer_page)
108+
# Record a material count before filtering
109+
before = self._get_count(explorer_page, "material",
110+
"https://w3id.org/isample/vocabulary/material/1.0/organicmaterial")
111+
assert before is not None, "organicmaterial facet-count not found"
112+
113+
# Click SESAR checkbox
114+
self._click_checkbox(explorer_page, "SESAR")
115+
116+
# Wait for cross-filter update (labels update in-place via DOM mutation)
117+
explorer_page.wait_for_timeout(5000)
118+
119+
after = self._get_count(explorer_page, "material",
120+
"https://w3id.org/isample/vocabulary/material/1.0/organicmaterial")
121+
assert after is not None
122+
assert after < before, (
123+
f"organicmaterial count should decrease with SESAR filter: {before} -> {after}"
124+
)
125+
126+
def test_clearing_filter_restores_baseline(self, explorer_page):
127+
"""Unchecking a source should restore baseline counts."""
128+
self._wait_for_facets(explorer_page)
129+
baseline = self._get_count(explorer_page, "material",
130+
"https://w3id.org/isample/vocabulary/material/1.0/earthmaterial")
131+
132+
# Activate then deactivate SESAR
133+
self._click_checkbox(explorer_page, "SESAR")
134+
explorer_page.wait_for_timeout(5000)
135+
filtered = self._get_count(explorer_page, "material",
136+
"https://w3id.org/isample/vocabulary/material/1.0/earthmaterial")
137+
138+
self._click_checkbox(explorer_page, "SESAR")
139+
explorer_page.wait_for_timeout(5000)
140+
restored = self._get_count(explorer_page, "material",
141+
"https://w3id.org/isample/vocabulary/material/1.0/earthmaterial")
142+
143+
assert filtered != baseline, "Filter should have changed the count"
144+
assert restored == baseline, (
145+
f"Count should restore to baseline after clearing: {baseline} -> {restored}"
146+
)
147+
148+
def test_zero_count_items_are_dimmed(self, explorer_page):
149+
"""Facet values with 0 matches should have reduced opacity."""
150+
self._wait_for_facets(explorer_page)
151+
152+
# SMITHSONIAN is smallest source — filtering to it should zero some facets
153+
self._click_checkbox(explorer_page, "SMITHSONIAN")
154+
explorer_page.wait_for_timeout(5000)
155+
156+
# Find any facet-count with "(0)" and check opacity
157+
zero_counts = explorer_page.locator(".facet-count").filter(has_text="(0)")
158+
if zero_counts.count() > 0:
159+
opacity = zero_counts.first.evaluate("el => getComputedStyle(el).opacity")
160+
assert float(opacity) < 1.0, "Zero-count items should be dimmed"
161+
162+
def test_new_parquet_endpoints_reachable(self, explorer_page):
163+
"""The cross-filter and sample_facets parquet files should be accessible."""
164+
import subprocess
165+
for url in [
166+
"https://data.isamples.org/isamples_202601_facet_cross_filter.parquet",
167+
"https://data.isamples.org/isamples_202601_sample_facets_v2.parquet",
168+
]:
169+
result = subprocess.run(
170+
["curl", "-s", "-o", "/dev/null", "-w", "%{http_code}", "--head", url],
171+
capture_output=True, text=True
172+
)
173+
code = result.stdout.strip()
174+
assert code in ("200", "206"), f"{url} returned {code}"
175+
176+
74177
class TestExplorerSampleCard:
75178
"""Sample Card section should exist."""
76179

0 commit comments

Comments
 (0)