Expand column-level API section and improve flow

ezmiller · eca · ezmiller · commit b083733bfb50 · 2026-03-28T13:30:40.000-04:00
Adds detailed explanation of tablecloth.time's two-level design, covering parsing, conversion, and extraction primitives before the high-level add-time-columns wrapper. 🤖 Generated with [eca](https://eca.dev) Co-Authored-By: eca <noreply@eca.dev>
diff --git a/src/ezmiller/relaunching_tablecloth_time.clj b/src/ezmiller/relaunching_tablecloth_time.clj
@@ -10,6 +10,8 @@
 (ns ezmiller.relaunching-tablecloth-time
   (:require [tablecloth.api :as tc]
             [tablecloth.time.api :as tct]
+            [tablecloth.time.column.api :as tctc]
+            [tablecloth.time.parse :as tparse]
             [scicloj.tableplot.v1.plotly :as plotly]
             [tech.v3.datatype.functional :as dfn]
             [scicloj.kindly.v4.kind :as kind]))
@@ -26,7 +28,7 @@
 ;; had built this project around a dataset index mechanism that was
 ;; built into tech.ml.dataset, but after that feature was removed in
 ;; v7, the project required a rethink. This post walks through that
-;; rethink and the projects core core primitives today using the
+;; rethink and the core primitives today, using the
 ;; Victoria electricity demand dataset.
 
 ;; ## Why No Index?
@@ -63,11 +65,14 @@
 ;; on humanscodes.
 
 ;; Now let's dig into this library's primitives and basic functionality.
+;; Throughout these examples, `tc` refers to `tablecloth.api`,
+;; `tct` refers to `tablecloth.time.api`, and `tctc` refers to
+;; `tablecloth.time.column.api`.
 
 ;; ## Loading the Data
 ;;
 ;; We'll use the `vic_elec` dataset: half-hourly electricity demand from Victoria,
-;; Australia, spanning 2012-2014. Let's load it and take a look. Throughout these examples the tablecloth library is aliased as `tc`, following the common conventio, and tablecloth.time is aliased as `tct`. 
+;; Australia, spanning 2012-2014. Strings are parsed to datetime types on load:
 
 (def vic-elec
   (-> (tc/dataset "https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv"
@@ -79,16 +84,79 @@
 ;; The dataset has half-hourly readings with `:Time`, `:Demand` (in MW), 
 ;; `:Temperature`, and other fields.
 
-;; ## Extracting Time Components
+;; ## Time at the Column Level
 ;;
-;; The first primitive is `add-time-columns`. It extracts temporal fields from a
-;; datetime column — day-of-week, month, hour, etc. — as new columns you can
-;; group or filter on. Here's a quick look at what it produces:
+;; Before diving into the high-level API, it's worth understanding what's
+;; underneath. tablecloth.time mirrors tablecloth's two-level design: a
+;; dataset API and a column API. The column API is where the actual time
+;; manipulation happens, built on dtype-next's vectorized operations.
+;;
+;; Why does this matter? Because manipulating time data is notoriously fiddly.
+;; Java's `java.time` package is powerful but verbose. Working with columns
+;; of timestamps — converting, extracting, flooring — typically means writing
+;; loops or mapping functions over sequences. tablecloth.time's column API
+;; gives you operations that work on entire columns at once, using the same
+;; fast, primitive-backed machinery as the rest of tech.ml.dataset.
+;;
+;; The building blocks fall into three categories:
+;;
+;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and
+;; custom formats with cached formatters for performance. For now this is
+;; scalar (single value), but bulk parsing happens automatically when loading
+;; datasets with `tc/convert-types`.
+;;
+;; **Conversion** — `convert-time` moves between representations (Instants,
+;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness.
+;; This is the workhorse for preparing time columns for different operations.
+;;
+;; **Flooring and extraction** — `down-to-nearest`, `floor-to-month`, and
+;; field extractors like `year`, `hour`, `day-of-week` operate on columns
+;; using dtype-next's vectorized arithmetic. These are **column in, column out**:
+
+;; Extract just the hour from the Time column:
+(tctc/hour (:Time vic-elec))
+
+;; Floor timestamps to hour buckets:
+(tctc/down-to-nearest (:Time vic-elec) 1 :hours {:zone "UTC"})
+
+;; The key thing to notice: no Clojure seqs, no explicit loops. These
+;; operations work on primitive arrays under the hood, just like dtype-next's
+;; numeric operations. The result is a column that can be added directly
+;; to a dataset.
+
+;; ## Building Up: add-time-columns
+;;
+;; With these column-level tools in hand, the dataset-level API is just
+;; convenience. `add-time-columns` — the function that most users reach
+;; for first — is actually a thin wrapper around the extractors we just saw.
+;;
+;; Here's what it does internally:
+;;
+;; 1. Take the source time column from the dataset
+;; 2. Look up extractor functions from a map (`:year` → `tctc/year`, etc.)
+;; 3. Apply each extractor to produce new columns
+;; 4. Add those columns back to the dataset
+;;
+;; The "primitive" is just composition of lower-level pieces. This matters
+;; because it means you can drop down when the high-level API doesn't
+;; quite fit. Need a custom computed field? Build it from the column
+;; tools and add it yourself.
+;;
+;; Let's see it in action:
 
 (-> vic-elec
     (tct/add-time-columns :Time [:day-of-week :hour])
     (tc/head 10))
 
+;; ## The Resampling Pattern
+;;
+;; With time fields extracted, standard tablecloth operations take
+;; over. Resampling, which in time series means aggregating to coarser
+;; time granularity, is just another pattern of composition: add time
+;; columns, group, aggregate, order.
+;;
+;; Let's break it into two steps. First, the data transformation:
+
 (def demand-by-day
   (-> vic-elec
       (tct/add-time-columns :Time [:day-of-week])
@@ -99,18 +167,48 @@
 ;; Look at the aggregated data:
 (tc/head demand-by-day 7)
 
-;; Step 2: Visualize the result:
+;; Then visualize:
 (plotly/layer-bar demand-by-day
                   {:=x :day-of-week :=y :Demand})
 
 ;; Weekends (days 6 and 7) clearly have lower demand. The `:day-of-week` field
 ;; came from `add-time-columns`; the group-by, aggregate, and order-by are pure
-;; tablecloth. The two libraries compose seamlessly.
+;; tablecloth. tablecloth.time provides the time-specific pieces, then gets
+;; out of the way.
+;;
+;; The same pattern scales to different granularities. Here are daily and
+;; monthly averages:
+
+;; Daily averages:
+(-> vic-elec
+    (tct/add-time-columns :Time [:year :month :day])
+    (tc/group-by [:year :month :day])
+    (tc/aggregate {:Demand #(dfn/mean (:Demand %))
+                   :Temperature #(dfn/mean (:Temperature %))})
+    (tc/order-by [:year :month :day])
+    (tc/head 10))
+
+;; Monthly averages:
+(-> vic-elec
+    (tct/add-time-columns :Time [:year :month])
+    (tc/group-by [:year :month])
+    (tc/aggregate {:Demand #(dfn/mean (:Demand %))})
+    (tc/order-by [:year :month])
+    (plotly/layer-bar {:=x :month :=y :Demand :=color :year}))
+
+;; Note that tablecloth.time is just a light layer here. You could do this
+;; with tablecloth alone by manually extracting datetime components.
+;; `add-time-columns` just adds concision — it composes naturally with the
+;; tablecloth operations you're already using.
 
 ;; ## Slicing Time Ranges
 ;;
-;; `slice` selects rows within a time range using binary search on sorted data.
-;; It's fast even on large datasets.
+;; `slice` selects rows within a time range using binary search on
+;; sorted data. Here is where we would have previously leaned on an
+;; index. Now we use binary search on a sorted column. It's fast even
+;; on large datasets — the O(log n) lookup without the overhead of
+;; maintaining a tree structure, though it may need to sort the data
+;; if unsorted.
 
 (-> vic-elec
     (tct/slice :Time "2012-01-09" "2012-01-15")
@@ -127,11 +225,10 @@
 
 ;; ## Lag and Lead Columns
 ;;
-;; `add-lag` shifts column values by a fixed number of rows — useful for 
+;; `add-lag` shifts column values by a fixed number of rows — useful for
 ;; autocorrelation analysis. Note this is row-based, not time-aware: you need
-;; to know your data's frequency and calculate the offset.
-;;
-;; Since this dataset has half-hourly readings, a lag of 48 rows equals 24 hours:
+;; to know your data's frequency and calculate the offset. Since this dataset
+;; has half-hourly readings, a lag of 48 rows equals 24 hours:
 
 (-> vic-elec
     (tct/add-lag :Demand 48 :Demand_lag48)
@@ -149,9 +246,10 @@
 
 ;; The tight diagonal shows strong positive correlation — demand at any given
 ;; time is highly predictive of demand at the same time the previous day.
-
-;; `add-lead` shifts values forward — current Demand aligns with Demand 24 hours
-;; ahead. Let's see if today's demand predicts tomorrow's:
+;;
+;; `add-lead` works the same way but shifts values forward. Current demand
+;; aligns with demand 24 hours ahead — useful when you need to align past
+;; observations with future outcomes for predictive modeling:
 
 (-> vic-elec
     (tct/add-lead :Demand 48 :Demand_lead48)
@@ -160,35 +258,6 @@
                          :=y :Demand_lead48
                          :=mark-opacity 0.3}))
 
-;; ## Resampling as a Pattern
-;;
-;; We showed the resampling pattern above: extract time fields, group, aggregate,
-;; order. The same pattern scales to different granularities. Here are daily and
-;; monthly averages using the same building blocks:
-
-;; Daily averages:
-(-> vic-elec
-    (tct/add-time-columns :Time [:year :month :day])
-    (tc/group-by [:year :month :day])
-    (tc/aggregate {:Demand #(dfn/mean (:Demand %))
-                   :Temperature #(dfn/mean (:Temperature %))})
-    (tc/order-by [:year :month :day])
-    (tc/head 10))
-
-;; Monthly averages:
-(-> vic-elec
-    (tct/add-time-columns :Time [:year :month])
-    (tc/group-by [:year :month])
-    (tc/aggregate {:Demand #(dfn/mean (:Demand %))})
-    (tc/order-by [:year :month])
-    (plotly/layer-bar {:=x :month :=y :Demand :=color :year}))
-
-;; Note that tablecloth.time is just a light layer in these
-;; expressions. You could do this with tablecloth alone by manually
-;; extracting datetime components. tablecloth.time's add-time-columns
-;; just adds concision and expressiveness — it composes naturally with
-;; the tablecloth operations.
-
 ;; ## Combining Primitives
 ;;
 ;; Let's do something more interesting: analyze the daily demand profile,
@@ -207,41 +276,17 @@
 ;; Weekday demand shows the classic two-peak pattern (morning and evening),
 ;; while weekend demand is flatter and lower overall.
 
-;; ## Time Utilities (Column API)
-;;
-;; tablecloth.time mirrors tablecloth's structure: a dataset API (`tct`)
-;; and a column API (`tablecloth.time.column.api`). The column API provides
-;; lower-level utilities for working with time data directly — parsing,
-;; conversion, flooring, extraction. These power the high-level functions
-;; and are available when you need finer control.
-;;
-;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and
-;; custom formats with cached formatters for performance.
-;;
-;; **Conversion** — `convert-time` moves between representations (Instants,
-;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness.
-;;
-;; **Flooring** — `down-to-nearest`, `floor-to-month`, `floor-to-quarter` bucket
-;; timestamps to intervals. Useful for aggregating sub-daily data:
-
-(require '[tablecloth.time.column.api :as tctc])
-
-(-> vic-elec
-    (tc/add-column :HourBucket
-                   #(tctc/down-to-nearest (% :Time) 1 :hours {:zone "UTC"}))
-    (tc/head 5))
-
-;; The column API parallels `tablecloth.column.api` — work with columns
-;; directly, then add them to your dataset. The high-level dataset functions
-;; are convenience wrappers built from these pieces. Manipulating time data
-;; is notoriously fiddly; tablecloth.time tries to smooth the sharp edges
-;; without hiding the underlying java.time power.
-
 ;; ## What's Next
 ;;
-;; tablecloth.time is experimental. Planned additions include rolling windows,
-;; differencing, and higher-level patterns like `resample` that wrap the
-;; composable building blocks.
+;; tablecloth.time is experimental. The current release provides
+;; focused primitives built on solid foundations: parsing, conversion,
+;; and field extraction at the column level; convenient dataset-level
+;; wrappers that compose with standard tablecloth operations. My hope
+;; is this provides a solid basis for building convinient abstractions
+;; that are just patterns of composition.
+;;
+;; Planned additions include rolling windows, differencing, and higher-level
+;; patterns like `resample` that wrap the composable building blocks.
 ;;
 ;; The [repository is on GitHub](https://github.com/scicloj/tablecloth.time).
 ;; For more worked examples, see the