You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance OpenContext parquet tutorial with comprehensive lessons from notebook
Major additions:
- Critical discovery section highlighting correct vs incorrect query patterns
- Step-by-step debugging methodology showing how to find relationship paths
- Interactive validation queries demonstrating 0 → 1M+ sample recovery
- Archaeological insights with top sites and material distributions
- Comprehensive performance guidelines and debugging strategies
Key lessons emphasized:
- Multi-hop traversal required (Sample → Event → Location)
- Direct Sample→Location relationships don't exist (critical bug fix)
- Property graph debugging methodology for complex datasets
- Archaeological context with major sites (Çatalhöyük, Petra, etc.)
This tutorial now captures all essential insights from the enhanced notebook
for browser-based analysis and serves as definitive reference for querying
the OpenContext property graph structure.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
When visualizing archaeological data, always respect location sensitivity flags. Obfuscated coordinates are intentionally imprecise to protect archaeological sites from looting.
450
495
:::
451
496
497
+
## 🔍 Debugging Methodology: How We Found the Correct Paths
498
+
499
+
### Step 1: Verify Relationship Existence
500
+
```{ojs}
501
+
// Debug: What relationships actually exist FROM MaterialSampleRecord?
502
+
debugRelationships = {
503
+
const query = `
504
+
SELECT DISTINCT e.p as predicate, COUNT(*) as count
505
+
FROM nodes s
506
+
JOIN nodes e ON s.row_id = e.s
507
+
WHERE s.otype = 'MaterialSampleRecord'
508
+
AND e.otype = '_edge_'
509
+
GROUP BY e.p
510
+
ORDER BY count DESC
511
+
`;
512
+
const data = await loadData(query, [], "loading_debug_rels");
-**Multi-hop traversal required** for meaningful queries
856
+
-**No shortcuts exist** - respect the graph model
857
+
858
+
### 🔧 Debugging Methodology
859
+
860
+
1.**Verify relationships exist** before building complex queries
861
+
2.**Trace step-by-step** from simple counts to complex joins
862
+
3.**Test multiple paths** - graphs often have alternative routes
863
+
4.**Validate results** against known entity counts
864
+
865
+
### ⚡ Performance Guidelines
866
+
867
+
1.**Filter by `otype` first** - reduces 11M rows to manageable subsets
868
+
2.**Use CTEs** for complex multi-hop queries
869
+
3.**Aggregate before filtering** when possible
870
+
4.**Respect obfuscated coordinates** for site protection
871
+
872
+
### 🏛️ Archaeological Context
873
+
874
+
-**Major sites**: Çatalhöyük, Petra, Polis Chrysochous dominate sample counts
875
+
-**Material types**: Biogenic non-organic materials most common
876
+
-**Global reach**: Arctic to Antarctic coverage with sensitive location protection
877
+
-**Research value**: 1M+ precisely located specimens for spatial analysis
878
+
879
+
### 🚀 Advanced Applications
880
+
881
+
This corrected understanding enables:
882
+
-**Spatial clustering analysis** of archaeological finds
883
+
-**Temporal pattern recognition** through sampling events
884
+
-**Site similarity studies** via material type distributions
885
+
-**Collection bias analysis** through agent and responsibility networks
886
+
887
+
The key to success: **Understand the graph model first, query second.** This property graph structure reflects the real-world complexity of archaeological data collection and enables sophisticated analysis when queried correctly.
573
888
574
-
This property graph structure enables:
889
+
## Next Steps
575
890
576
-
-**Flexible relationships** between archaeological entities
577
-
-**Efficient queries** through DuckDB's columnar storage
578
-
-**Complex traversals** to connect samples with locations, events, and metadata
579
-
-**Scalable analysis** of 11.6M records with reasonable performance
891
+
Ready to analyze this data? Remember:
892
+
1. Start with entity relationship exploration
893
+
2. Build queries incrementally
894
+
3. Validate results at each step
895
+
4. Respect archaeological site sensitivities
580
896
581
-
The key to working with this data is understanding the graph structure and using appropriate JOIN patterns to traverse relationships between entities.
0 commit comments