Skip to content

Commit e2770a5

Browse files
rdhyeeclaude
andauthored
Tutorials query /current/wide.parquet instead of dated file (#134)
Interactive Explorer, Search Explorer, and Deep-Dive Analysis now query through the stable /current/ alias so they pick up the current enriched wide parquet (with OpenContext thumbnails) without needing per-tutorial URL updates on rebuild. The alias 302-redirects to the latest dated file; DuckDB-WASM follows redirects transparently and range requests after the redirect go to the target directly. - tutorials/progressive_globe.qmd: wide_url (lazy description fetch on sample click; v2 explorer description fetch) - tutorials/isamples_explorer.qmd: wide_url (v1 primary source) - tutorials/zenodo_isamples_analysis.qmd: primary data source narrow_vs_wide_performance.qmd intentionally keeps dated URLs — benchmarks need reproducibility, not freshness. Data catalog updates: - how-to-use.qmd: document the /current/ alias pattern, explain the trade (stable alias for interactive work vs. dated URL for pinned reproducibility), preserve historical isamples_202601_wide.parquet pointer for anyone pinning. - tutorials/index.qmd: primary Wide format row points at the alias and notes the rotation convention. Closes the "#2" item from #131's status comment (migrate tutorials to /current/wide.parquet). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent d382a20 commit e2770a5

5 files changed

Lines changed: 21 additions & 7 deletions

File tree

how-to-use.qmd

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,20 @@ The two main files carrying the sample records themselves:
6060

6161
| File | Size | Shape | Rows | Use when you need… |
6262
|---|---:|---|---:|---|
63-
| [`isamples_202601_wide.parquet`](https://data.isamples.org/isamples_202601_wide.parquet) | 278 MB | Wide (one row per entity, nested relationships in `p__*` array columns) | 20 M | General entity queries, UI filtering, description text |
63+
| [`current/wide.parquet`](https://data.isamples.org/current/wide.parquet) | 292 MB | Wide (one row per entity, nested relationships in `p__*` array columns) | 20 M | General entity queries, UI filtering, description text |
6464
| [`isamples_202601_wide_h3.parquet`](https://data.isamples.org/isamples_202601_wide_h3.parquet) | 292 MB | Wide + H3 BIGINT indices (`h3_res4`, `h3_res6`, `h3_res8`) | 20 M | Geospatial queries with H3 clustering at arbitrary zoom |
6565
| [`isamples_202512_narrow.parquet`](https://data.isamples.org/isamples_202512_narrow.parquet) | 820 MB | Narrow (graph: nodes + explicit `_edge_` rows, s/p/o/n fields) | 106 M | Graph traversals, relationship-centric analysis, PQG work |
6666

67+
`/current/wide.parquet` is a stable alias that HTTP 302-redirects to the
68+
latest dated file (currently
69+
[`isamples_202604_wide.parquet`](https://data.isamples.org/isamples_202604_wide.parquet),
70+
enriched with ~47 K OpenContext thumbnails). The dated filename is
71+
immutable; the alias rotates atomically when we rebuild. Use the alias for
72+
interactive work, the dated URL when you want a pinned, reproducible
73+
reference. The original
74+
[`isamples_202601_wide.parquet`](https://data.isamples.org/isamples_202601_wide.parquet)
75+
(278 MB, no thumbnails) is kept available for historical pinning.
76+
6777
All three represent the same underlying data (SESAR + OpenContext + GEOME
6878
+ Smithsonian) with identical semantics — they differ only in serialization
6979
strategy. See the
@@ -123,7 +133,7 @@ import duckdb
123133
con = duckdb.connect()
124134
con.sql("""
125135
SELECT source, COUNT(*) AS n
126-
FROM read_parquet('https://data.isamples.org/isamples_202601_wide.parquet')
136+
FROM read_parquet('https://data.isamples.org/current/wide.parquet')
127137
WHERE otype = 'MaterialSampleRecord'
128138
GROUP BY 1 ORDER BY 2 DESC
129139
""").df()

tutorials/index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ All data is hosted on [`data.isamples.org`](https://data.isamples.org) with HTTP
2828

2929
| File | Size | Description |
3030
|------|------|-------------|
31-
| [Wide format](https://data.isamples.org/isamples_202601_wide.parquet) | 278 MB | One row per entity, all sources — primary file for tutorials |
31+
| [Wide format](https://data.isamples.org/current/wide.parquet) | 292 MB | One row per entity, all sources — primary file for tutorials. Stable alias redirects to the current dated build (`isamples_YYYYMM_wide.parquet`). |
3232
| [Wide + H3](https://data.isamples.org/isamples_202601_wide_h3.parquet) | 292 MB | Wide format with H3 spatial indices for globe visualizations |
3333
| [Facet summaries](https://data.isamples.org/isamples_202601_facet_summaries.parquet) | 2 KB | Pre-computed filter counts — loads instantly |
3434
| [H3 clusters (res4)](https://data.isamples.org/isamples_202601_h3_summary_res4.parquet) | 0.6 MB | Zoomed-out globe view |

tutorials/isamples_explorer.qmd

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,10 @@ duckdbModule = import("https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.28.0/+
8282
// description fetch on click, no ORDER BY RANDOM(), lazy Cesium mount).
8383
explorerVersion = new URLSearchParams(location.search).get('v') === '2' ? 'v2' : 'v1'
8484
85-
// Data source configuration
86-
wide_url = "https://data.isamples.org/isamples_202601_wide.parquet"
85+
// Data source configuration.
86+
// wide_url uses the /current/ alias so we pick up the latest enriched build
87+
// (with OpenContext thumbnails); the alias 302-redirects to the dated file.
88+
wide_url = "https://data.isamples.org/current/wide.parquet"
8789
lite_url = "https://data.isamples.org/isamples_202601_samples_map_lite.parquet"
8890
parquet_url = explorerVersion === 'v2' ? lite_url : wide_url
8991

tutorials/progressive_globe.qmd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,9 @@ h3_res4_url = `${R2_BASE}/isamples_202601_h3_summary_res4.parquet`
202202
h3_res6_url = `${R2_BASE}/isamples_202601_h3_summary_res6.parquet`
203203
h3_res8_url = `${R2_BASE}/isamples_202601_h3_summary_res8.parquet`
204204
lite_url = `${R2_BASE}/isamples_202601_samples_map_lite.parquet`
205-
wide_url = `${R2_BASE}/isamples_202601_wide.parquet`
205+
// Stable alias that 302-redirects to the current enriched wide parquet
206+
// (isamples_YYYYMM_wide.parquet). Gets OpenContext thumbnails populated.
207+
wide_url = `${R2_BASE}/current/wide.parquet`
206208
facets_url = `${R2_BASE}/isamples_202601_sample_facets.parquet`
207209
facet_summaries_url = `${R2_BASE}/isamples_202601_facet_summaries.parquet`
208210

tutorials/zenodo_isamples_analysis.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ parquet_urls = [
9595
'https://data.isamples.org/isamples_202601_wide_h3.parquet',
9696
9797
// Fallback: original wide format without H3
98-
'https://data.isamples.org/isamples_202601_wide.parquet',
98+
'https://data.isamples.org/current/wide.parquet',
9999
100100
// Fallback: older versions
101101
'https://labs.dataunbound.com/docs/2025/07/isamples_export_2025_04_21_16_23_46_geo.parquet',

0 commit comments

Comments
 (0)