Skip to content

Commit b083733

Browse files
ezmillereca
andcommitted
Expand column-level API section and improve flow
Adds detailed explanation of tablecloth.time's two-level design, covering parsing, conversion, and extraction primitives before the high-level add-time-columns wrapper. 🤖 Generated with [eca](https://eca.dev) Co-Authored-By: eca <noreply@eca.dev>
1 parent aef2a8e commit b083733

1 file changed

Lines changed: 124 additions & 79 deletions

File tree

src/ezmiller/relaunching_tablecloth_time.clj

Lines changed: 124 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
(ns ezmiller.relaunching-tablecloth-time
1111
(:require [tablecloth.api :as tc]
1212
[tablecloth.time.api :as tct]
13+
[tablecloth.time.column.api :as tctc]
14+
[tablecloth.time.parse :as tparse]
1315
[scicloj.tableplot.v1.plotly :as plotly]
1416
[tech.v3.datatype.functional :as dfn]
1517
[scicloj.kindly.v4.kind :as kind]))
@@ -26,7 +28,7 @@
2628
;; had built this project around a dataset index mechanism that was
2729
;; built into tech.ml.dataset, but after that feature was removed in
2830
;; v7, the project required a rethink. This post walks through that
29-
;; rethink and the projects core core primitives today using the
31+
;; rethink and the core primitives today, using the
3032
;; Victoria electricity demand dataset.
3133

3234
;; ## Why No Index?
@@ -63,11 +65,14 @@
6365
;; on humanscodes.
6466

6567
;; Now let's dig into this library's primitives and basic functionality.
68+
;; Throughout these examples, `tc` refers to `tablecloth.api`,
69+
;; `tct` refers to `tablecloth.time.api`, and `tctc` refers to
70+
;; `tablecloth.time.column.api`.
6671

6772
;; ## Loading the Data
6873
;;
6974
;; We'll use the `vic_elec` dataset: half-hourly electricity demand from Victoria,
70-
;; Australia, spanning 2012-2014. Let's load it and take a look. Throughout these examples the tablecloth library is aliased as `tc`, following the common conventio, and tablecloth.time is aliased as `tct`.
75+
;; Australia, spanning 2012-2014. Strings are parsed to datetime types on load:
7176

7277
(def vic-elec
7378
(-> (tc/dataset "https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv"
@@ -79,16 +84,79 @@
7984
;; The dataset has half-hourly readings with `:Time`, `:Demand` (in MW),
8085
;; `:Temperature`, and other fields.
8186

82-
;; ## Extracting Time Components
87+
;; ## Time at the Column Level
8388
;;
84-
;; The first primitive is `add-time-columns`. It extracts temporal fields from a
85-
;; datetime column — day-of-week, month, hour, etc. — as new columns you can
86-
;; group or filter on. Here's a quick look at what it produces:
89+
;; Before diving into the high-level API, it's worth understanding what's
90+
;; underneath. tablecloth.time mirrors tablecloth's two-level design: a
91+
;; dataset API and a column API. The column API is where the actual time
92+
;; manipulation happens, built on dtype-next's vectorized operations.
93+
;;
94+
;; Why does this matter? Because manipulating time data is notoriously fiddly.
95+
;; Java's `java.time` package is powerful but verbose. Working with columns
96+
;; of timestamps — converting, extracting, flooring — typically means writing
97+
;; loops or mapping functions over sequences. tablecloth.time's column API
98+
;; gives you operations that work on entire columns at once, using the same
99+
;; fast, primitive-backed machinery as the rest of tech.ml.dataset.
100+
;;
101+
;; The building blocks fall into three categories:
102+
;;
103+
;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and
104+
;; custom formats with cached formatters for performance. For now this is
105+
;; scalar (single value), but bulk parsing happens automatically when loading
106+
;; datasets with `tc/convert-types`.
107+
;;
108+
;; **Conversion** — `convert-time` moves between representations (Instants,
109+
;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness.
110+
;; This is the workhorse for preparing time columns for different operations.
111+
;;
112+
;; **Flooring and extraction** — `down-to-nearest`, `floor-to-month`, and
113+
;; field extractors like `year`, `hour`, `day-of-week` operate on columns
114+
;; using dtype-next's vectorized arithmetic. These are **column in, column out**:
115+
116+
;; Extract just the hour from the Time column:
117+
(tctc/hour (:Time vic-elec))
118+
119+
;; Floor timestamps to hour buckets:
120+
(tctc/down-to-nearest (:Time vic-elec) 1 :hours {:zone "UTC"})
121+
122+
;; The key thing to notice: no Clojure seqs, no explicit loops. These
123+
;; operations work on primitive arrays under the hood, just like dtype-next's
124+
;; numeric operations. The result is a column that can be added directly
125+
;; to a dataset.
126+
127+
;; ## Building Up: add-time-columns
128+
;;
129+
;; With these column-level tools in hand, the dataset-level API is just
130+
;; convenience. `add-time-columns` — the function that most users reach
131+
;; for first — is actually a thin wrapper around the extractors we just saw.
132+
;;
133+
;; Here's what it does internally:
134+
;;
135+
;; 1. Take the source time column from the dataset
136+
;; 2. Look up extractor functions from a map (`:year` → `tctc/year`, etc.)
137+
;; 3. Apply each extractor to produce new columns
138+
;; 4. Add those columns back to the dataset
139+
;;
140+
;; The "primitive" is just composition of lower-level pieces. This matters
141+
;; because it means you can drop down when the high-level API doesn't
142+
;; quite fit. Need a custom computed field? Build it from the column
143+
;; tools and add it yourself.
144+
;;
145+
;; Let's see it in action:
87146

88147
(-> vic-elec
89148
(tct/add-time-columns :Time [:day-of-week :hour])
90149
(tc/head 10))
91150

151+
;; ## The Resampling Pattern
152+
;;
153+
;; With time fields extracted, standard tablecloth operations take
154+
;; over. Resampling, which in time series means aggregating to coarser
155+
;; time granularity, is just another pattern of composition: add time
156+
;; columns, group, aggregate, order.
157+
;;
158+
;; Let's break it into two steps. First, the data transformation:
159+
92160
(def demand-by-day
93161
(-> vic-elec
94162
(tct/add-time-columns :Time [:day-of-week])
@@ -99,18 +167,48 @@
99167
;; Look at the aggregated data:
100168
(tc/head demand-by-day 7)
101169

102-
;; Step 2: Visualize the result:
170+
;; Then visualize:
103171
(plotly/layer-bar demand-by-day
104172
{:=x :day-of-week :=y :Demand})
105173

106174
;; Weekends (days 6 and 7) clearly have lower demand. The `:day-of-week` field
107175
;; came from `add-time-columns`; the group-by, aggregate, and order-by are pure
108-
;; tablecloth. The two libraries compose seamlessly.
176+
;; tablecloth. tablecloth.time provides the time-specific pieces, then gets
177+
;; out of the way.
178+
;;
179+
;; The same pattern scales to different granularities. Here are daily and
180+
;; monthly averages:
181+
182+
;; Daily averages:
183+
(-> vic-elec
184+
(tct/add-time-columns :Time [:year :month :day])
185+
(tc/group-by [:year :month :day])
186+
(tc/aggregate {:Demand #(dfn/mean (:Demand %))
187+
:Temperature #(dfn/mean (:Temperature %))})
188+
(tc/order-by [:year :month :day])
189+
(tc/head 10))
190+
191+
;; Monthly averages:
192+
(-> vic-elec
193+
(tct/add-time-columns :Time [:year :month])
194+
(tc/group-by [:year :month])
195+
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
196+
(tc/order-by [:year :month])
197+
(plotly/layer-bar {:=x :month :=y :Demand :=color :year}))
198+
199+
;; Note that tablecloth.time is just a light layer here. You could do this
200+
;; with tablecloth alone by manually extracting datetime components.
201+
;; `add-time-columns` just adds concision — it composes naturally with the
202+
;; tablecloth operations you're already using.
109203

110204
;; ## Slicing Time Ranges
111205
;;
112-
;; `slice` selects rows within a time range using binary search on sorted data.
113-
;; It's fast even on large datasets.
206+
;; `slice` selects rows within a time range using binary search on
207+
;; sorted data. Here is where we would have previously leaned on an
208+
;; index. Now we use binary search on a sorted column. It's fast even
209+
;; on large datasets — the O(log n) lookup without the overhead of
210+
;; maintaining a tree structure, though it may need to sort the data
211+
;; if unsorted.
114212

115213
(-> vic-elec
116214
(tct/slice :Time "2012-01-09" "2012-01-15")
@@ -127,11 +225,10 @@
127225

128226
;; ## Lag and Lead Columns
129227
;;
130-
;; `add-lag` shifts column values by a fixed number of rows — useful for
228+
;; `add-lag` shifts column values by a fixed number of rows — useful for
131229
;; autocorrelation analysis. Note this is row-based, not time-aware: you need
132-
;; to know your data's frequency and calculate the offset.
133-
;;
134-
;; Since this dataset has half-hourly readings, a lag of 48 rows equals 24 hours:
230+
;; to know your data's frequency and calculate the offset. Since this dataset
231+
;; has half-hourly readings, a lag of 48 rows equals 24 hours:
135232

136233
(-> vic-elec
137234
(tct/add-lag :Demand 48 :Demand_lag48)
@@ -149,9 +246,10 @@
149246

150247
;; The tight diagonal shows strong positive correlation — demand at any given
151248
;; time is highly predictive of demand at the same time the previous day.
152-
153-
;; `add-lead` shifts values forward — current Demand aligns with Demand 24 hours
154-
;; ahead. Let's see if today's demand predicts tomorrow's:
249+
;;
250+
;; `add-lead` works the same way but shifts values forward. Current demand
251+
;; aligns with demand 24 hours ahead — useful when you need to align past
252+
;; observations with future outcomes for predictive modeling:
155253

156254
(-> vic-elec
157255
(tct/add-lead :Demand 48 :Demand_lead48)
@@ -160,35 +258,6 @@
160258
:=y :Demand_lead48
161259
:=mark-opacity 0.3}))
162260

163-
;; ## Resampling as a Pattern
164-
;;
165-
;; We showed the resampling pattern above: extract time fields, group, aggregate,
166-
;; order. The same pattern scales to different granularities. Here are daily and
167-
;; monthly averages using the same building blocks:
168-
169-
;; Daily averages:
170-
(-> vic-elec
171-
(tct/add-time-columns :Time [:year :month :day])
172-
(tc/group-by [:year :month :day])
173-
(tc/aggregate {:Demand #(dfn/mean (:Demand %))
174-
:Temperature #(dfn/mean (:Temperature %))})
175-
(tc/order-by [:year :month :day])
176-
(tc/head 10))
177-
178-
;; Monthly averages:
179-
(-> vic-elec
180-
(tct/add-time-columns :Time [:year :month])
181-
(tc/group-by [:year :month])
182-
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
183-
(tc/order-by [:year :month])
184-
(plotly/layer-bar {:=x :month :=y :Demand :=color :year}))
185-
186-
;; Note that tablecloth.time is just a light layer in these
187-
;; expressions. You could do this with tablecloth alone by manually
188-
;; extracting datetime components. tablecloth.time's add-time-columns
189-
;; just adds concision and expressiveness — it composes naturally with
190-
;; the tablecloth operations.
191-
192261
;; ## Combining Primitives
193262
;;
194263
;; Let's do something more interesting: analyze the daily demand profile,
@@ -207,41 +276,17 @@
207276
;; Weekday demand shows the classic two-peak pattern (morning and evening),
208277
;; while weekend demand is flatter and lower overall.
209278

210-
;; ## Time Utilities (Column API)
211-
;;
212-
;; tablecloth.time mirrors tablecloth's structure: a dataset API (`tct`)
213-
;; and a column API (`tablecloth.time.column.api`). The column API provides
214-
;; lower-level utilities for working with time data directly — parsing,
215-
;; conversion, flooring, extraction. These power the high-level functions
216-
;; and are available when you need finer control.
217-
;;
218-
;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and
219-
;; custom formats with cached formatters for performance.
220-
;;
221-
;; **Conversion** — `convert-time` moves between representations (Instants,
222-
;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness.
223-
;;
224-
;; **Flooring** — `down-to-nearest`, `floor-to-month`, `floor-to-quarter` bucket
225-
;; timestamps to intervals. Useful for aggregating sub-daily data:
226-
227-
(require '[tablecloth.time.column.api :as tctc])
228-
229-
(-> vic-elec
230-
(tc/add-column :HourBucket
231-
#(tctc/down-to-nearest (% :Time) 1 :hours {:zone "UTC"}))
232-
(tc/head 5))
233-
234-
;; The column API parallels `tablecloth.column.api` — work with columns
235-
;; directly, then add them to your dataset. The high-level dataset functions
236-
;; are convenience wrappers built from these pieces. Manipulating time data
237-
;; is notoriously fiddly; tablecloth.time tries to smooth the sharp edges
238-
;; without hiding the underlying java.time power.
239-
240279
;; ## What's Next
241280
;;
242-
;; tablecloth.time is experimental. Planned additions include rolling windows,
243-
;; differencing, and higher-level patterns like `resample` that wrap the
244-
;; composable building blocks.
281+
;; tablecloth.time is experimental. The current release provides
282+
;; focused primitives built on solid foundations: parsing, conversion,
283+
;; and field extraction at the column level; convenient dataset-level
284+
;; wrappers that compose with standard tablecloth operations. My hope
285+
;; is this provides a solid basis for building convinient abstractions
286+
;; that are just patterns of composition.
287+
;;
288+
;; Planned additions include rolling windows, differencing, and higher-level
289+
;; patterns like `resample` that wrap the composable building blocks.
245290
;;
246291
;; The [repository is on GitHub](https://github.com/scicloj/tablecloth.time).
247292
;; For more worked examples, see the

0 commit comments

Comments
 (0)