Skip to content

ReproNim/ABCD-visidata

Repository files navigation

VisiData Multi-Table Navigation

Navigate multi-table datasets with metadata using VisiData. Originally developed for ABCD data, adaptable to any similar structure.

Quick Start

# Install VisiData
pip install visidata

# Open metadata file with config
vd --config abcd-visidatarc dd.parquet

# Navigate to a row and press:
#   zo  - Open data table
#   zO  - Open and focus on specific column
#   zm  - Merge selected columns into one table

Features

Custom Key Bindings

Key Action Description
zo Open table Opens data table from current metadata row
zO Open + focus Opens table and jumps to specific column
gzo Batch open Opens all selected rows' tables
zi Show info Display table info in status bar
zm Merge columns Merge selected columns into single table

Automatic Features

  • Auto key detection: Sets participant_id/subject_id and session_id as join keys
  • Format support: Works with both .tsv and .parquet files
  • Smart merging: Replicates static data (no session_id) across sessions when merging
  • ID mapping: Automatically maps subject_idparticipant_id
  • Column type detection: Automatically applies appropriate types based on column name patterns:
    • _age, _hrs, _height, _weight, _bmi, _pct, _score → float
    • _year, _count, _num, grade → integer
    • _dt, _date → date (YYYY-MM-DD)
    • _dtt, _datetime → datetime (ISO 8601)

Configuration

Customize via environment variables to work with any dataset structure:

# Data directory (default: 'data')
export ABCD_TABLES_DIR=my_data

# Metadata column names (defaults shown)
export ABCD_TABLE_COL=table_name      # Column containing table name
export ABCD_COLUMN_COL=name           # Column containing column/variable name
export ABCD_ID_COLS_COL=identifier_columns  # Column listing join keys

# Default identifier columns (default: 'participant_id|session_id')
export ABCD_DEFAULT_IDS=subject_id|visit_id

Example: Custom dataset

export ABCD_TABLES_DIR=raw_data
export ABCD_TABLE_COL=source_table
export ABCD_COLUMN_COL=variable
vd --config abcd-visidatarc metadata.tsv

Common Workflows

1. Explore a specific table

vd --config abcd-visidatarc dd.parquet
/          # Search for table name
zo         # Open the table

2. Find and view a variable

vd --config abcd-visidatarc dd.parquet
g/         # Global search for variable
zO         # Open table at that column

3. Merge columns from multiple tables

vd --config abcd-visidatarc dd.parquet
s          # Select first row (column to merge)
s          # Select second row
s          # Select more rows...
zm         # Merge into single table

The merge automatically:

  • Joins on participant_id and session_id
  • Replicates static data (tables without session_id) across all sessions
  • Orders columns: participant_id first, session_id second, then data columns

4. Join tables manually

vd --config abcd-visidatarc dd.parquet
s s s      # Select tables you want
gzo        # Open all selected tables
S          # View sheets list
s s        # Select data sheets to join
&          # Join (VisiData built-in)

Installation

Option 1: Use --config flag (Recommended)

vd --config abcd-visidatarc dd.parquet

Option 2: Make global (symlink to ~/.visidatarc)

ln -s $(pwd)/abcd-visidatarc ~/.visidatarc

Option 3: Create shell alias

# Add to ~/.bashrc or ~/.zshrc
alias vd-abcd='vd --config /path/to/abcd-visidatarc'

# Usage
vd-abcd dd.parquet

Metadata Structure

Your metadata file should have columns for:

Column Purpose Example Value
Table name Points to data file demographics
Column name Variable in table demo_age
Identifier cols Join keys (optional) participant_id|session_id
Label Description (optional) Age at visit

Example metadata row:

table_name: demographics
name: demo_age
identifier_columns: participant_id | session_id
label: Age at visit in years

Corresponding data file: data/demographics.tsv or .parquet

Column Type Detection

Tables are automatically formatted based on column name patterns:

Pattern Type Example VisiData Type
*_age, *_hrs, *_height, *_weight, *_bmi Float demo_age FloatColumn
*_pct, *_percent, *_score, *_rate, *_ratio Float accuracy_pct FloatColumn
*_year, *_count, *_num, grade Integer birth_year IntColumn
*_dt, *_date Date visit_dt DateColumn (YYYY-MM-DD)
*_dtt, *_datetime DateTime scan_dtt DateColumn (ISO 8601)
Everything else String participant_id Default

This enables:

  • Proper numeric sorting and aggregation
  • Date arithmetic and formatting
  • Type-specific operations in VisiData

Example: Opening demographics.tsv automatically formats demo_age as float, allowing you to compute statistics (mean, median) with VisiData's aggregation features.

Session ID Replication

The merge function handles mixed time-varying and static data:

Example:

# demographics.tsv (has session_id)
participant_id  session_id  demo_age
NDAR001        baseline    10
NDAR001        year_1      11

# static_demo.tsv (no session_id)
participant_id  birth_year
NDAR001        2015

# After merge (zm):
participant_id  session_id  demo_age  birth_year
NDAR001        baseline    10        2015        # Replicated
NDAR001        year_1      11        2015        # Replicated

Project Structure

.
├── abcd-visidatarc         # Main configuration file
├── dd.parquet              # Metadata file
├── data/                   # Data tables directory
│   ├── demographics.tsv
│   ├── cognitive.tsv
│   └── ...
└── tests/                  # Test suite
    ├── test_merge.py
    └── test_visidata_integration.py

Troubleshooting

Key binding not working?

  • Ensure you're using --config abcd-visidatarc or have symlinked to ~/.visidatarc
  • Check you're in the metadata sheet, not a data table

Table not opening?

  • Verify ABCD_TABLES_DIR points to correct directory
  • Check file exists: ls $ABCD_TABLES_DIR/{table_name}.*

Merge creates wrong structure?

  • Verify your metadata has correct identifier_columns values
  • Check VisiData status bar for config on startup

Wrong columns being used?

  • Set appropriate env vars: ABCD_TABLE_COL, ABCD_COLUMN_COL, etc.
  • Status bar shows current config when VisiData loads

Standard VisiData Keys

Useful keys for this workflow:

Key Action
/ g/ Search in column / globally
n Next search result
s u Select / unselect row
! Mark column as key
& Join sheets
S View all sheets
F Frequency table
Ctrl+S Save/export
q Quit/close sheet

See QUICK_REFERENCE.md for complete key bindings reference.

Testing

Run the test suite:

python3 tests/test_merge.py
python3 tests/test_visidata_integration.py

Learn More

Requirements

  • VisiData >= 2.0 (pip install visidata)
  • Python >= 3.7
  • Optional: pyarrow for better Parquet support

License

This configuration is provided as-is for navigating multi-table datasets.

About

Navigate ABCD data with visidata

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors