|
1 | 1 | ^{:kindly/hide-code true |
2 | | - :clay {:title "Relaunch tablecloth.time: Composability over Abstraction" |
| 2 | + :clay {:title "Relaunching tablecloth.time: Composability over Abstraction" |
3 | 3 | :quarto {:author [:ezmiller] |
4 | 4 | :description "A composable approach to time series analysis in Clojure" |
5 | 5 | :draft false |
|
22 | 22 | <figcaption>Half-hourly electricity demand in Victoria, Australia (2012–2014). Each line is one day, phased over the time of day (0 = midnight, 1 = midnight). Colors indicate year.</figcaption> |
23 | 23 | </figure>") |
24 | 24 |
|
25 | | -;; I recently relaunched an old Scicloj project called [tablecloth.time](https://github.com/scicloj/tablecloth.time). The goal of this project was to build a composable |
26 | | -;; extension for time series analysis built on top of |
27 | | -;; [tablecloth](https://scicloj.github.io/tablecloth/). Originally, we |
28 | | -;; had built this project around a dataset index mechanism that was |
29 | | -;; built into tech.ml.dataset, but after that feature was removed in |
30 | | -;; v7, the project required a rethink. This post walks through that |
31 | | -;; rethink and the core primitives today, using the |
32 | | -;; Victoria electricity demand dataset. |
| 25 | +;; I recently relaunched the experimental time series processing |
| 26 | +;; library [tablecloth.time](https://github.com/scicloj/tablecloth.time) |
| 27 | +;; — this time without an index. Turns out that's a feature, not a |
| 28 | +;; limitation. Here's why, and a walkthrough of the composable |
| 29 | +;; primitives that replace it, using the Victoria electricity demand |
| 30 | +;; dataset. |
33 | 31 |
|
34 | 32 | ;; ## Why No Index? |
35 | 33 | ;; |
|
58 | 56 | ;; The pipeline reads like what it does. This aligns with Clojure's broader preference |
59 | 57 | ;; for explicit, composable operations over hidden magic. |
60 | 58 | ;; |
61 | | -;; The simplicity isn't a compromise. It's the feature. |
62 | | -;; |
63 | 59 | ;; For the full discussion of this design shift, see |
64 | 60 | ;; [Composability Over Abstraction](https://humanscodes.com/tablecloth-time-relaunch) |
65 | 61 | ;; on humanscodes. |
66 | | - |
67 | | -;; Now let's dig into this library's primitives and basic functionality. |
| 62 | +;; |
68 | 63 | ;; Throughout these examples, `tc` refers to `tablecloth.api`, |
69 | 64 | ;; `tct` refers to `tablecloth.time.api`, and `tctc` refers to |
70 | 65 | ;; `tablecloth.time.column.api`. |
|
92 | 87 | ;; manipulation happens, built on dtype-next's vectorized operations. |
93 | 88 | ;; |
94 | 89 | ;; Why does this matter? Because manipulating time data is notoriously fiddly. |
95 | | -;; Java's `java.time` package is powerful but verbose. Working with columns |
96 | | -;; of timestamps — converting, extracting, flooring — typically means writing |
97 | | -;; loops or mapping functions over sequences. tablecloth.time's column API |
98 | | -;; gives you operations that work on entire columns at once, using the same |
99 | | -;; fast, primitive-backed machinery as the rest of tech.ml.dataset. |
| 90 | +;; Clojure has excellent time libraries — |
| 91 | +;; [tick](https://github.com/juxt/tick), |
| 92 | +;; [cljc.java-time](https://github.com/henryw374/cljc.java-time) — |
| 93 | +;; that tame java.time's verbosity. But they operate on scalars. Working with |
| 94 | +;; columns of timestamps still means mapping functions over sequences. |
| 95 | +;; tablecloth.time's column API gives you operations that work on entire |
| 96 | +;; columns at once, using the same fast, primitive-backed machinery as the |
| 97 | +;; rest of tech.ml.dataset. |
100 | 98 | ;; |
101 | 99 | ;; The building blocks fall into three categories: |
102 | 100 | ;; |
|
119 | 117 | ;; Floor timestamps to hour buckets: |
120 | 118 | (tctc/down-to-nearest (:Time vic-elec) 1 :hours {:zone "UTC"}) |
121 | 119 |
|
122 | | -;; The key thing to notice: no Clojure seqs, no explicit loops. These |
123 | | -;; operations work on primitive arrays under the hood, just like dtype-next's |
124 | | -;; numeric operations. The result is a column that can be added directly |
125 | | -;; to a dataset. |
| 120 | +;; The key thing to notice: these operations work on primitive arrays |
| 121 | +;; under the hood, just like dtype-next's numeric operations. The |
| 122 | +;; result is a column that can be added directly to a dataset. |
126 | 123 |
|
127 | 124 | ;; ## Building Up: add-time-columns |
128 | 125 | ;; |
129 | | -;; With these column-level tools in hand, the dataset-level API is just |
130 | | -;; convenience. `add-time-columns` — the function that most users reach |
131 | | -;; for first — is actually a thin wrapper around the extractors we just saw. |
| 126 | +;; With these column-level tools in hand, the dataset-level API is |
| 127 | +;; just convenience. A core example is `add-time-columns`. It is just |
| 128 | +;; a thin wrapper around the extractors we just saw. |
132 | 129 | ;; |
133 | 130 | ;; Here's what it does internally: |
134 | 131 | ;; |
|
179 | 176 | ;; The same pattern scales to different granularities. Here are daily and |
180 | 177 | ;; monthly averages: |
181 | 178 |
|
182 | | -;; Daily averages: |
| 179 | +;; Daily averages (first 10 days): |
183 | 180 | (-> vic-elec |
184 | 181 | (tct/add-time-columns :Time [:year :month :day]) |
185 | 182 | (tc/group-by [:year :month :day]) |
|
188 | 185 | (tc/order-by [:year :month :day]) |
189 | 186 | (tc/head 10)) |
190 | 187 |
|
191 | | -;; Monthly averages: |
| 188 | +;; Monthly averages — each bar is a month, colored by year: |
192 | 189 | (-> vic-elec |
193 | 190 | (tct/add-time-columns :Time [:year :month]) |
194 | 191 | (tc/group-by [:year :month]) |
195 | 192 | (tc/aggregate {:Demand #(dfn/mean (:Demand %))}) |
196 | 193 | (tc/order-by [:year :month]) |
197 | | - (plotly/layer-bar {:=x :month :=y :Demand :=color :year})) |
| 194 | + (plotly/layer-bar {:=x :month :=y :Demand :=color :year :=color-type :nominal})) |
198 | 195 |
|
199 | 196 | ;; Note that tablecloth.time is just a light layer here. You could do this |
200 | 197 | ;; with tablecloth alone by manually extracting datetime components. |
|
0 commit comments