|
| 1 | +^{:kindly/hide-code true |
| 2 | + :clay {:title "Relaunching tablecloth.time: Composability over Abstraction" |
| 3 | + :quarto {:author [:ezmiller] |
| 4 | + :description "A composable approach to time series analysis in Clojure" |
| 5 | + :draft false |
| 6 | + :type :post |
| 7 | + :date "2026-03-27" |
| 8 | + :category :clojure |
| 9 | + :tags [:time-series :tablecloth :data-science]}}} |
| 10 | +(ns ezmiller.relaunching-tablecloth-time |
| 11 | + (:require [tablecloth.api :as tc] |
| 12 | + [tablecloth.time.api :as tct] |
| 13 | + [tablecloth.time.column.api :as tctc] |
| 14 | + [tablecloth.time.parse :as tparse] |
| 15 | + [scicloj.tableplot.v1.plotly :as plotly] |
| 16 | + [tech.v3.datatype.functional :as dfn] |
| 17 | + [scicloj.kindly.v4.kind :as kind])) |
| 18 | + |
| 19 | +^{:kindly/hide-code true} |
| 20 | +(kind/html "<figure> |
| 21 | + <img src=\"vic_elec_yearly.png\" alt=\"Victoria electricity demand by day of year, colored by year\" style=\"width:100%\"/> |
| 22 | + <figcaption>Half-hourly electricity demand in Victoria, Australia (2012–2014). Each line is one day, phased over the time of day (0 = midnight, 1 = midnight). Colors indicate year.</figcaption> |
| 23 | +</figure>") |
| 24 | + |
| 25 | +;; I recently relaunched the experimental time series processing |
| 26 | +;; library [tablecloth.time](https://github.com/scicloj/tablecloth.time) |
| 27 | +;; — this time without an index. Turns out that's a feature, not a |
| 28 | +;; limitation. Here's why, and a walkthrough of the composable |
| 29 | +;; primitives that replace it, using the Victoria electricity demand |
| 30 | +;; dataset. |
| 31 | + |
| 32 | +;; ## Why No Index? |
| 33 | +;; |
| 34 | +;; The original tablecloth.time was built around an index for two |
| 35 | +;; reasons: performance (tree-based indexes offer O(log n) lookups) |
| 36 | +;; and convenience |
| 37 | +;; (you don't have to keep specifying which column is the time |
| 38 | +;; column). Anyone who has used the Python Pandas data processing |
| 39 | +;; library is likely familiar with this feature. |
| 40 | +;; |
| 41 | +;; But when tech.ml.dataset removed its indexing mechanism in v7, it forced a |
| 42 | +;; rethink. And the rethink revealed that neither rationale held up. |
| 43 | +;; |
| 44 | +;; **On performance:** Unlike Python DataFrames, Clojure's datasets are immutable. |
| 45 | +;; They're rebuilt on each transformation. Under these conditions, maintaining a |
| 46 | +;; tree-based index is pure overhead — you'd rebuild it constantly. As Chris |
| 47 | +;; Nuernberger (author of tech.ml.dataset) |
| 48 | +;; [put it](https://clojurians.zulipchat.com/#narrow/channel/236259-tech.2Eml.2Edataset.2Edev/topic/index.20structures.20in.20Columns.20-.20scope/near/481581872): |
| 49 | +;; "Just sorting the dataset and using binary search will outperform most/all |
| 50 | +;; tree structures in this scenario." |
| 51 | +;; (This is the same conclusion that [Polars](https://pola.rs/), the fastest-growing |
| 52 | +;; Pandas alternative, reached — no index by design.) |
| 53 | +;; |
| 54 | +;; **On convenience:** The index adds implicit state threaded through your data. |
| 55 | +;; Tablecloth's API avoids this — you always say which columns you're operating on. |
| 56 | +;; The pipeline reads like what it does. This aligns with Clojure's broader preference |
| 57 | +;; for explicit, composable operations over hidden magic. |
| 58 | +;; |
| 59 | +;; For the full discussion of this design shift, see |
| 60 | +;; [Composability Over Abstraction](https://humanscodes.com/tablecloth-time-relaunch) |
| 61 | +;; on humanscodes. |
| 62 | +;; |
| 63 | +;; Throughout these examples, `tc` refers to `tablecloth.api`, |
| 64 | +;; `tct` refers to `tablecloth.time.api`, and `tctc` refers to |
| 65 | +;; `tablecloth.time.column.api`. |
| 66 | + |
| 67 | +;; ## Loading the Data |
| 68 | +;; |
| 69 | +;; We'll use the `vic_elec` dataset: half-hourly electricity demand from Victoria, |
| 70 | +;; Australia, spanning 2012-2014. Strings are parsed to datetime types on load: |
| 71 | + |
| 72 | +(def vic-elec |
| 73 | + (-> (tc/dataset "https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv" |
| 74 | + {:key-fn keyword}) |
| 75 | + (tc/convert-types :Time :local-date-time))) |
| 76 | + |
| 77 | +(tc/head vic-elec) |
| 78 | + |
| 79 | +;; The dataset has half-hourly readings with `:Time`, `:Demand` (in MW), |
| 80 | +;; `:Temperature`, and other fields. |
| 81 | + |
| 82 | +;; ## Time at the Column Level |
| 83 | +;; |
| 84 | +;; Before diving into the high-level API, it's worth understanding what's |
| 85 | +;; underneath. tablecloth.time mirrors tablecloth's two-level design: a |
| 86 | +;; dataset API and a column API. The column API is where the actual time |
| 87 | +;; manipulation happens, built on dtype-next's vectorized operations. |
| 88 | +;; |
| 89 | +;; Why does this matter? Because manipulating time data is notoriously fiddly. |
| 90 | +;; Clojure has excellent time libraries — |
| 91 | +;; [tick](https://github.com/juxt/tick), |
| 92 | +;; [cljc.java-time](https://github.com/henryw374/cljc.java-time) — |
| 93 | +;; that tame java.time's verbosity. But they operate on scalars. Working with |
| 94 | +;; columns of timestamps still means mapping functions over sequences. |
| 95 | +;; tablecloth.time's column API gives you operations that work on entire |
| 96 | +;; columns at once, using the same fast, primitive-backed machinery as the |
| 97 | +;; rest of tech.ml.dataset. |
| 98 | +;; |
| 99 | +;; The building blocks fall into three categories: |
| 100 | +;; |
| 101 | +;; **Parsing** — `tablecloth.time.parse/parse` handles ISO-8601 strings and |
| 102 | +;; custom formats with cached formatters for performance. For now this is |
| 103 | +;; scalar (single value), but bulk parsing happens automatically when loading |
| 104 | +;; datasets with `tc/convert-types`. |
| 105 | +;; |
| 106 | +;; **Conversion** — `convert-time` moves between representations (Instants, |
| 107 | +;; LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness. |
| 108 | +;; This is the workhorse for preparing time columns for different operations. |
| 109 | +;; |
| 110 | +;; **Flooring and extraction** — `down-to-nearest`, `floor-to-month`, and |
| 111 | +;; field extractors like `year`, `hour`, `day-of-week` operate on columns |
| 112 | +;; using dtype-next's vectorized arithmetic. These are **column in, column out**: |
| 113 | + |
| 114 | +;; Extract just the hour from the Time column: |
| 115 | +(tctc/hour (:Time vic-elec)) |
| 116 | + |
| 117 | +;; Floor timestamps to hour buckets: |
| 118 | +(tctc/down-to-nearest (:Time vic-elec) 1 :hours {:zone "UTC"}) |
| 119 | + |
| 120 | +;; The key thing to notice: these operations work on primitive arrays |
| 121 | +;; under the hood, just like dtype-next's numeric operations. The |
| 122 | +;; result is a column that can be added directly to a dataset. |
| 123 | + |
| 124 | +;; ## Building Up: add-time-columns |
| 125 | +;; |
| 126 | +;; With these column-level tools in hand, the dataset-level API is |
| 127 | +;; just convenience. A core example is `add-time-columns`. It is just |
| 128 | +;; a thin wrapper around the extractors we just saw. |
| 129 | +;; |
| 130 | +;; Here's what it does internally: |
| 131 | +;; |
| 132 | +;; 1. Take the source time column from the dataset |
| 133 | +;; 2. Look up extractor functions from a map (`:year` → `tctc/year`, etc.) |
| 134 | +;; 3. Apply each extractor to produce new columns |
| 135 | +;; 4. Add those columns back to the dataset |
| 136 | +;; |
| 137 | +;; The "primitive" is just composition of lower-level pieces. This matters |
| 138 | +;; because it means you can drop down when the high-level API doesn't |
| 139 | +;; quite fit. Need a custom computed field? Build it from the column |
| 140 | +;; tools and add it yourself. |
| 141 | +;; |
| 142 | +;; Let's see it in action: |
| 143 | + |
| 144 | +(-> vic-elec |
| 145 | + (tct/add-time-columns :Time [:day-of-week :hour]) |
| 146 | + (tc/head 10)) |
| 147 | + |
| 148 | +;; ## The Resampling Pattern |
| 149 | +;; |
| 150 | +;; With time fields extracted, standard tablecloth operations take |
| 151 | +;; over. Resampling, which in time series means aggregating to coarser |
| 152 | +;; time granularity, is just another pattern of composition: add time |
| 153 | +;; columns, group, aggregate, order. |
| 154 | +;; |
| 155 | +;; Let's break it into two steps. First, the data transformation: |
| 156 | + |
| 157 | +(def demand-by-day |
| 158 | + (-> vic-elec |
| 159 | + (tct/add-time-columns :Time [:day-of-week]) |
| 160 | + (tc/group-by [:day-of-week]) |
| 161 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
| 162 | + (tc/order-by [:day-of-week]))) |
| 163 | + |
| 164 | +;; Look at the aggregated data: |
| 165 | +(tc/head demand-by-day 7) |
| 166 | + |
| 167 | +;; Then visualize: |
| 168 | +(plotly/layer-bar demand-by-day |
| 169 | + {:=x :day-of-week :=y :Demand}) |
| 170 | + |
| 171 | +;; Weekends (days 6 and 7) clearly have lower demand. The `:day-of-week` field |
| 172 | +;; came from `add-time-columns`; the group-by, aggregate, and order-by are pure |
| 173 | +;; tablecloth. tablecloth.time provides the time-specific pieces, then gets |
| 174 | +;; out of the way. |
| 175 | +;; |
| 176 | +;; The same pattern scales to different granularities. Here are daily and |
| 177 | +;; monthly averages: |
| 178 | + |
| 179 | +;; Daily averages (first 10 days): |
| 180 | +(-> vic-elec |
| 181 | + (tct/add-time-columns :Time [:year :month :day]) |
| 182 | + (tc/group-by [:year :month :day]) |
| 183 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %)) |
| 184 | + :Temperature #(dfn/mean (:Temperature %))}) |
| 185 | + (tc/order-by [:year :month :day]) |
| 186 | + (tc/head 10)) |
| 187 | + |
| 188 | +;; Monthly averages — each bar is a month, colored by year: |
| 189 | +(-> vic-elec |
| 190 | + (tct/add-time-columns :Time [:year :month]) |
| 191 | + (tc/group-by [:year :month]) |
| 192 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
| 193 | + (tc/order-by [:year :month]) |
| 194 | + (plotly/layer-bar {:=x :month :=y :Demand :=color :year :=color-type :nominal})) |
| 195 | + |
| 196 | +;; Note that tablecloth.time is just a light layer here. You could do this |
| 197 | +;; with tablecloth alone by manually extracting datetime components. |
| 198 | +;; `add-time-columns` just adds concision — it composes naturally with the |
| 199 | +;; tablecloth operations you're already using. |
| 200 | + |
| 201 | +;; ## Slicing Time Ranges |
| 202 | +;; |
| 203 | +;; `slice` selects rows within a time range using binary search on |
| 204 | +;; sorted data. Here is where we would have previously leaned on an |
| 205 | +;; index. Now we use binary search on a sorted column. It's fast even |
| 206 | +;; on large datasets — the O(log n) lookup without the overhead of |
| 207 | +;; maintaining a tree structure, though it may need to sort the data |
| 208 | +;; if unsorted. |
| 209 | + |
| 210 | +(-> vic-elec |
| 211 | + (tct/slice :Time "2012-01-09" "2012-01-15") |
| 212 | + (tc/row-count)) |
| 213 | + |
| 214 | +;; One week of data — 336 half-hourly observations. Let's visualize it: |
| 215 | + |
| 216 | +(-> vic-elec |
| 217 | + (tct/slice :Time "2012-01-09" "2012-01-15") |
| 218 | + (plotly/layer-line {:=x :Time :=y :Demand})) |
| 219 | + |
| 220 | +;; The daily oscillation is clearly visible: demand peaks during the day and |
| 221 | +;; drops at night. |
| 222 | + |
| 223 | +;; ## Lag and Lead Columns |
| 224 | +;; |
| 225 | +;; `add-lag` shifts column values by a fixed number of rows — useful for |
| 226 | +;; autocorrelation analysis. Note this is row-based, not time-aware: you need |
| 227 | +;; to know your data's frequency and calculate the offset. Since this dataset |
| 228 | +;; has half-hourly readings, a lag of 48 rows equals 24 hours: |
| 229 | + |
| 230 | +(-> vic-elec |
| 231 | + (tct/add-lag :Demand 48 :Demand_lag48) |
| 232 | + (tc/drop-missing) |
| 233 | + (tc/head 10)) |
| 234 | + |
| 235 | +;; Let's see if demand correlates with the same time yesterday: |
| 236 | + |
| 237 | +(-> vic-elec |
| 238 | + (tct/add-lag :Demand 48 :Demand_lag48) |
| 239 | + (tc/drop-missing) |
| 240 | + (plotly/layer-point {:=x :Demand_lag48 |
| 241 | + :=y :Demand |
| 242 | + :=mark-opacity 0.3})) |
| 243 | + |
| 244 | +;; The tight diagonal shows strong positive correlation — demand at any given |
| 245 | +;; time is highly predictive of demand at the same time the previous day. |
| 246 | +;; |
| 247 | +;; `add-lead` works the same way but shifts values forward. Current demand |
| 248 | +;; aligns with demand 24 hours ahead — useful when you need to align past |
| 249 | +;; observations with future outcomes for predictive modeling: |
| 250 | + |
| 251 | +(-> vic-elec |
| 252 | + (tct/add-lead :Demand 48 :Demand_lead48) |
| 253 | + (tc/drop-missing) |
| 254 | + (plotly/layer-point {:=x :Demand |
| 255 | + :=y :Demand_lead48 |
| 256 | + :=mark-opacity 0.3})) |
| 257 | + |
| 258 | +;; ## Combining Primitives |
| 259 | +;; |
| 260 | +;; Let's do something more interesting: analyze the daily demand profile, |
| 261 | +;; comparing weekdays to weekends. |
| 262 | + |
| 263 | +(-> vic-elec |
| 264 | + (tct/add-time-columns :Time [:day-of-week :hour]) |
| 265 | + (tc/map-columns :weekend? [:day-of-week] #(>= % 6)) |
| 266 | + (tc/group-by [:weekend? :hour]) |
| 267 | + (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
| 268 | + (tc/order-by [:hour]) |
| 269 | + (plotly/layer-line {:=x :hour |
| 270 | + :=y :Demand |
| 271 | + :=color :weekend?})) |
| 272 | + |
| 273 | +;; Weekday demand shows the classic two-peak pattern (morning and evening), |
| 274 | +;; while weekend demand is flatter and lower overall. |
| 275 | + |
| 276 | +;; ## What's Next |
| 277 | +;; |
| 278 | +;; tablecloth.time is experimental. The current release provides |
| 279 | +;; focused primitives built on solid foundations: parsing, conversion, |
| 280 | +;; and field extraction at the column level; convenient dataset-level |
| 281 | +;; wrappers that compose with standard tablecloth operations. My hope |
| 282 | +;; is this provides a solid basis for building convinient abstractions |
| 283 | +;; that are just patterns of composition. |
| 284 | +;; |
| 285 | +;; Planned additions include rolling windows, differencing, and higher-level |
| 286 | +;; patterns like `resample` that wrap the composable building blocks. |
| 287 | +;; |
| 288 | +;; The [repository is on GitHub](https://github.com/scicloj/tablecloth.time). |
| 289 | +;; For more worked examples, see the |
| 290 | +;; [fpp3 Chapter 2 notebook](https://kingkongbot.github.io/tablecloth.time/chapter_02_time_series_graphics.html). |
0 commit comments