Skip to content

Commit 4ab1e33

Browse files
Merge pull request #343 from ezmiller/tablecloth-time-post
Relaunching tablecloth.time post
2 parents 6be5453 + 655bf2d commit 4ab1e33

3 files changed

Lines changed: 297 additions & 1 deletion

File tree

deps.edn

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
:deps
44
{org.clojure/clojure {:mvn/version "1.12.3"}
55
org.scicloj/noj {:mvn/version "2-beta18"}
6+
org.scicloj/tablecloth.time {:mvn/version "1.00-alpha-6"}
67
;; Tableplot can be removed after updating Noj:
78
org.scicloj/tableplot {:mvn/version "1-beta14"}
89
markdown-clj/markdown-clj {:mvn/version "1.12.4"}

site/db.edn

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,12 @@
206206
:name "Pierre Baille"
207207
:image "https://avatars.githubusercontent.com/u/4739483"
208208
:url "https://github.com/pbaille"
209-
:links []}]
209+
:links []}
210+
{:id :ezmiller
211+
:name "Ethan Miller"
212+
:image "https://avatars.githubusercontent.com/u/772738?v=4&size=64"
213+
:url "https://github.com/ezmiller"
214+
:links ["https://humanscodes.com"]}]
210215

211216
:affiliation
212217
[{:id :clojure.core
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
^{:kindly/hide-code true
2+
:clay {:title "Relaunching tablecloth.time: Composability over Abstraction"
3+
:quarto {:author [:ezmiller]
4+
:description "A composable approach to time series analysis in Clojure"
5+
:draft false
6+
:type :post
7+
:date "2026-03-27"
8+
:category :clojure
9+
:tags [:time-series :tablecloth :data-science]}}}
10+
(ns ezmiller.relaunching-tablecloth-time
11+
(:require [tablecloth.api :as tc]
12+
[tablecloth.time.api :as tct]
13+
[tablecloth.time.column.api :as tctc]
14+
[tablecloth.time.parse :as tparse]
15+
[scicloj.tableplot.v1.plotly :as plotly]
16+
[tech.v3.datatype.functional :as dfn]
17+
[scicloj.kindly.v4.kind :as kind]))
18+
19+
^{:kindly/hide-code true}
20+
(kind/html "<figure>
21+
<img src=\"vic_elec_yearly.png\" alt=\"Victoria electricity demand by day of year, colored by year\" style=\"width:100%\"/>
22+
<figcaption>Half-hourly electricity demand in Victoria, Australia (2012–2014). Each line is one day, phased over the time of day (0 = midnight, 1 = midnight). Colors indicate year.</figcaption>
23+
</figure>")
24+
25+
;; I recently relaunched the experimental time series processing
26+
;; library [tablecloth.time](https://github.com/scicloj/tablecloth.time)
27+
;; — this time without an index. Turns out that's a feature, not a
28+
;; limitation. Here's why, and a walkthrough of the composable
29+
;; primitives that replace it, using the Victoria electricity demand
30+
;; dataset.
31+
32+
;; ## Why No Index?
33+
;;
34+
;; The original tablecloth.time was built around an index for two
35+
;; reasons: performance (tree-based indexes offer O(log n) lookups)
36+
;; and convenience
37+
;; (you don't have to keep specifying which column is the time
38+
;; column). Anyone who has used the Python Pandas data processing
39+
;; library is likely familiar with this feature.
40+
;;
41+
;; But when tech.ml.dataset removed its indexing mechanism in v7, it forced a
42+
;; rethink. And the rethink revealed that neither rationale held up.
43+
;;
44+
;; **On performance:** Unlike Python DataFrames, Clojure's datasets are immutable.
45+
;; They're rebuilt on each transformation. Under these conditions, maintaining a
46+
;; tree-based index is pure overhead — you'd rebuild it constantly. As Chris
47+
;; Nuernberger (author of tech.ml.dataset)
48+
;; [put it](https://clojurians.zulipchat.com/#narrow/channel/236259-tech.2Eml.2Edataset.2Edev/topic/index.20structures.20in.20Columns.20-.20scope/near/481581872):
49+
;; "Just sorting the dataset and using binary search will outperform most/all
50+
;; tree structures in this scenario."
51+
;; (This is the same conclusion that [Polars](https://pola.rs/), the fastest-growing
52+
;; Pandas alternative, reached — no index by design.)
53+
;;
54+
;; **On convenience:** The index adds implicit state threaded through your data.
55+
;; Tablecloth's API avoids this — you always say which columns you're operating on.
56+
;; The pipeline reads like what it does. This aligns with Clojure's broader preference
57+
;; for explicit, composable operations over hidden magic.
58+
;;
59+
;; For the full discussion of this design shift, see
60+
;; [Composability Over Abstraction](https://humanscodes.com/tablecloth-time-relaunch)
61+
;; on humanscodes.
62+
;;
63+
;; Throughout these examples, `tc` refers to `tablecloth.api`,
64+
;; `tct` refers to `tablecloth.time.api`, and `tctc` refers to
65+
;; `tablecloth.time.column.api`.
66+
67+
;; ## Loading the Data
68+
;;
69+
;; We'll use the `vic_elec` dataset: half-hourly electricity demand from Victoria,
70+
;; Australia, spanning 2012-2014. Strings are parsed to datetime types on load:
71+
72+
(def vic-elec
73+
(-> (tc/dataset "https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv"
74+
{:key-fn keyword})
75+
(tc/convert-types :Time :local-date-time)))
76+
77+
(tc/head vic-elec)
78+
79+
;; The dataset has half-hourly readings with `:Time`, `:Demand` (in MW),
80+
;; `:Temperature`, and other fields.
81+
82+
;; ## Time at the Column Level
83+
;;
84+
;; Before diving into the high-level API, it's worth understanding what's
85+
;; underneath. tablecloth.time mirrors tablecloth's two-level design: a
86+
;; dataset API and a column API. The column API is where the actual time
87+
;; manipulation happens, built on dtype-next's vectorized operations.
88+
;;
89+
;; Why does this matter? Because manipulating time data is notoriously fiddly.
90+
;; Clojure has excellent time libraries —
91+
;; [tick](https://github.com/juxt/tick),
92+
;; [cljc.java-time](https://github.com/henryw374/cljc.java-time) —
93+
;; that tame java.time's verbosity. But they operate on scalars. Working with
94+
;; columns of timestamps still means mapping functions over sequences.
95+
;; tablecloth.time's column API gives you operations that work on entire
96+
;; columns at once, using the same fast, primitive-backed machinery as the
97+
;; rest of tech.ml.dataset.
98+
;;
99+
;; The building blocks fall into three categories:
100+
;;
101+
;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and
102+
;; custom formats with cached formatters for performance. For now this is
103+
;; scalar (single value), but bulk parsing happens automatically when loading
104+
;; datasets with `tc/convert-types`.
105+
;;
106+
;; **Conversion** — `convert-time` moves between representations (Instants,
107+
;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness.
108+
;; This is the workhorse for preparing time columns for different operations.
109+
;;
110+
;; **Flooring and extraction** — `down-to-nearest`, `floor-to-month`, and
111+
;; field extractors like `year`, `hour`, `day-of-week` operate on columns
112+
;; using dtype-next's vectorized arithmetic. These are **column in, column out**:
113+
114+
;; Extract just the hour from the Time column:
115+
(tctc/hour (:Time vic-elec))
116+
117+
;; Floor timestamps to hour buckets:
118+
(tctc/down-to-nearest (:Time vic-elec) 1 :hours {:zone "UTC"})
119+
120+
;; The key thing to notice: these operations work on primitive arrays
121+
;; under the hood, just like dtype-next's numeric operations. The
122+
;; result is a column that can be added directly to a dataset.
123+
124+
;; ## Building Up: add-time-columns
125+
;;
126+
;; With these column-level tools in hand, the dataset-level API is
127+
;; just convenience. A core example is `add-time-columns`. It is just
128+
;; a thin wrapper around the extractors we just saw.
129+
;;
130+
;; Here's what it does internally:
131+
;;
132+
;; 1. Take the source time column from the dataset
133+
;; 2. Look up extractor functions from a map (`:year` → `tctc/year`, etc.)
134+
;; 3. Apply each extractor to produce new columns
135+
;; 4. Add those columns back to the dataset
136+
;;
137+
;; The "primitive" is just composition of lower-level pieces. This matters
138+
;; because it means you can drop down when the high-level API doesn't
139+
;; quite fit. Need a custom computed field? Build it from the column
140+
;; tools and add it yourself.
141+
;;
142+
;; Let's see it in action:
143+
144+
(-> vic-elec
145+
(tct/add-time-columns :Time [:day-of-week :hour])
146+
(tc/head 10))
147+
148+
;; ## The Resampling Pattern
149+
;;
150+
;; With time fields extracted, standard tablecloth operations take
151+
;; over. Resampling, which in time series means aggregating to coarser
152+
;; time granularity, is just another pattern of composition: add time
153+
;; columns, group, aggregate, order.
154+
;;
155+
;; Let's break it into two steps. First, the data transformation:
156+
157+
(def demand-by-day
158+
(-> vic-elec
159+
(tct/add-time-columns :Time [:day-of-week])
160+
(tc/group-by [:day-of-week])
161+
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
162+
(tc/order-by [:day-of-week])))
163+
164+
;; Look at the aggregated data:
165+
(tc/head demand-by-day 7)
166+
167+
;; Then visualize:
168+
(plotly/layer-bar demand-by-day
169+
{:=x :day-of-week :=y :Demand})
170+
171+
;; Weekends (days 6 and 7) clearly have lower demand. The `:day-of-week` field
172+
;; came from `add-time-columns`; the group-by, aggregate, and order-by are pure
173+
;; tablecloth. tablecloth.time provides the time-specific pieces, then gets
174+
;; out of the way.
175+
;;
176+
;; The same pattern scales to different granularities. Here are daily and
177+
;; monthly averages:
178+
179+
;; Daily averages (first 10 days):
180+
(-> vic-elec
181+
(tct/add-time-columns :Time [:year :month :day])
182+
(tc/group-by [:year :month :day])
183+
(tc/aggregate {:Demand #(dfn/mean (:Demand %))
184+
:Temperature #(dfn/mean (:Temperature %))})
185+
(tc/order-by [:year :month :day])
186+
(tc/head 10))
187+
188+
;; Monthly averages — each bar is a month, colored by year:
189+
(-> vic-elec
190+
(tct/add-time-columns :Time [:year :month])
191+
(tc/group-by [:year :month])
192+
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
193+
(tc/order-by [:year :month])
194+
(plotly/layer-bar {:=x :month :=y :Demand :=color :year :=color-type :nominal}))
195+
196+
;; Note that tablecloth.time is just a light layer here. You could do this
197+
;; with tablecloth alone by manually extracting datetime components.
198+
;; `add-time-columns` just adds concision — it composes naturally with the
199+
;; tablecloth operations you're already using.
200+
201+
;; ## Slicing Time Ranges
202+
;;
203+
;; `slice` selects rows within a time range using binary search on
204+
;; sorted data. Here is where we would have previously leaned on an
205+
;; index. Now we use binary search on a sorted column. It's fast even
206+
;; on large datasets — the O(log n) lookup without the overhead of
207+
;; maintaining a tree structure, though it may need to sort the data
208+
;; if unsorted.
209+
210+
(-> vic-elec
211+
(tct/slice :Time "2012-01-09" "2012-01-15")
212+
(tc/row-count))
213+
214+
;; One week of data — 336 half-hourly observations. Let's visualize it:
215+
216+
(-> vic-elec
217+
(tct/slice :Time "2012-01-09" "2012-01-15")
218+
(plotly/layer-line {:=x :Time :=y :Demand}))
219+
220+
;; The daily oscillation is clearly visible: demand peaks during the day and
221+
;; drops at night.
222+
223+
;; ## Lag and Lead Columns
224+
;;
225+
;; `add-lag` shifts column values by a fixed number of rows — useful for
226+
;; autocorrelation analysis. Note this is row-based, not time-aware: you need
227+
;; to know your data's frequency and calculate the offset. Since this dataset
228+
;; has half-hourly readings, a lag of 48 rows equals 24 hours:
229+
230+
(-> vic-elec
231+
(tct/add-lag :Demand 48 :Demand_lag48)
232+
(tc/drop-missing)
233+
(tc/head 10))
234+
235+
;; Let's see if demand correlates with the same time yesterday:
236+
237+
(-> vic-elec
238+
(tct/add-lag :Demand 48 :Demand_lag48)
239+
(tc/drop-missing)
240+
(plotly/layer-point {:=x :Demand_lag48
241+
:=y :Demand
242+
:=mark-opacity 0.3}))
243+
244+
;; The tight diagonal shows strong positive correlation — demand at any given
245+
;; time is highly predictive of demand at the same time the previous day.
246+
;;
247+
;; `add-lead` works the same way but shifts values forward. Current demand
248+
;; aligns with demand 24 hours ahead — useful when you need to align past
249+
;; observations with future outcomes for predictive modeling:
250+
251+
(-> vic-elec
252+
(tct/add-lead :Demand 48 :Demand_lead48)
253+
(tc/drop-missing)
254+
(plotly/layer-point {:=x :Demand
255+
:=y :Demand_lead48
256+
:=mark-opacity 0.3}))
257+
258+
;; ## Combining Primitives
259+
;;
260+
;; Let's do something more interesting: analyze the daily demand profile,
261+
;; comparing weekdays to weekends.
262+
263+
(-> vic-elec
264+
(tct/add-time-columns :Time [:day-of-week :hour])
265+
(tc/map-columns :weekend? [:day-of-week] #(>= % 6))
266+
(tc/group-by [:weekend? :hour])
267+
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
268+
(tc/order-by [:hour])
269+
(plotly/layer-line {:=x :hour
270+
:=y :Demand
271+
:=color :weekend?}))
272+
273+
;; Weekday demand shows the classic two-peak pattern (morning and evening),
274+
;; while weekend demand is flatter and lower overall.
275+
276+
;; ## What's Next
277+
;;
278+
;; tablecloth.time is experimental. The current release provides
279+
;; focused primitives built on solid foundations: parsing, conversion,
280+
;; and field extraction at the column level; convenient dataset-level
281+
;; wrappers that compose with standard tablecloth operations. My hope
282+
;; is this provides a solid basis for building convinient abstractions
283+
;; that are just patterns of composition.
284+
;;
285+
;; Planned additions include rolling windows, differencing, and higher-level
286+
;; patterns like `resample` that wrap the composable building blocks.
287+
;;
288+
;; The [repository is on GitHub](https://github.com/scicloj/tablecloth.time).
289+
;; For more worked examples, see the
290+
;; [fpp3 Chapter 2 notebook](https://kingkongbot.github.io/tablecloth.time/chapter_02_time_series_graphics.html).

0 commit comments

Comments
 (0)