You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/data_analysis/book_sales_analysis/about_apriori.clj
+17-14Lines changed: 17 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,11 @@
26
26
27
27
;; When you run an indie publishing house with over 160 titles and sell thousands of books each month, one question keeps coming back: **Which books do our customers buy together?** This seemingly simple question led me down a fascinating path from basic correlation analysis to building a more robust recommendation system using association rule mining—all with Clojure and the SciCloj ecosystem.
28
28
29
+
^:kindly/hide-code
30
+
(kind/image
31
+
{:src"books.png"
32
+
:width"100%"
33
+
:max-width"800px"})
29
34
30
35
;; ## The Starting Point: Understanding Our Data
31
36
@@ -43,7 +48,7 @@
43
48
44
49
;; *(for clarity, many columns were omitted here; rows were generated with `(tc/random 5)` from anonymized dataset)*
45
50
46
-
;; Each row represented one order, with books listed as comma-separated values. There are many exceptions, incosistencies and format-based differences ^[^1]. To analyze purchasing patterns, I needed to transform this into a format where each book became a binary feature: did a customer buy it (1) or not (0)? This transformation is called **one-hot encoding**.
51
+
;; Each row represented one order, with books listed as comma-separated values. There are many exceptions, inconsistencies, and format-based differences ^[^1]. To analyze purchasing patterns, I needed to transform this into a format where each book became a binary feature: did a customer buy it (1) or not (0)? This transformation is called **one-hot encoding**.
47
52
;;
48
53
;; ---
49
54
;;
@@ -79,7 +84,7 @@
79
84
(tc/reorder-columns [:zakaznik])
80
85
(tc/random5)
81
86
(tc/head4)
82
-
(tc/select-columns#"^:zakaznik|^:book-00.*"))) ;; just quick reduction of table width
87
+
(tc/select-columns#"^:zakaznik|^:book-00.*")))
83
88
84
89
^:kindly/hide-code
85
90
(kind/table onehot-sample)
@@ -149,18 +154,17 @@
149
154
:zerolinetrue}
150
155
:yaxis {:title"Skewness <br>(more positive → less generic relation)"
151
156
:zerolinetrue}
152
-
:hovermode"closest"
153
-
:showlegendtrue
154
-
:height650
155
-
:width750
157
+
:hovermode"closest":showlegendtrue
158
+
:height650:width750
156
159
:margin {:l65:r50:b65:t90}
157
160
:legend {:x1:y1:xanchor"right"}}}))
158
161
162
+
^:kindly/hide-code
159
163
scatter-plot
160
164
161
165
;; Foreign bestsellers (marked in orange) showed consistently higher correlations with other books. They had broad appeal and were purchased alongside many different titles. Czech authors (in blue), however, showed lower correlations, suggesting their readers were more focused. Many customers would buy just one Czech title, often using it as a "gateway" into our catalog (strongly supported by Czech author's local campaigns), while foreign bestsellers were part of larger, more diverse purchases.
162
166
163
-
;; This insight immediately changed our marketing approach. We stopped using generic cross-sell recommendations for Czech authors and instead focused on building author-specific communities and started to cooperate with easily reachable Czech authors on cross-selling approach.
167
+
;; This insight immediately changed our marketing approach. We stopped using generic cross-sell recommendations for Czech authors and instead focused on building author-specific communities and started to cooperate with easily reachable local Czech authors on cross-selling approach.
164
168
165
169
;; ## The Limitation: Correlations Weren't Enough
166
170
@@ -296,13 +300,13 @@ scatter-plot
296
300
:min-confidence0.1)
297
301
8))
298
302
299
-
;; These recommendations are now powering a new "Customers Also Bought" section on our website (even still in "manual" mode :), complementing our existing "Topically Similar" recommendations with data-driven insights.
303
+
;; These recommendations are now powering a new "Customers Also Bought" section on our website (still in "manual" mode for now), complementing our existing "Topically Similar" recommendations with data-driven insights.
300
304
301
305
;; ## Why This Matters for the Clojure Community
302
306
303
307
;; This project demonstrates several strengths of Clojure and the SciCloj ecosystem for real-world data science:
304
308
305
-
;; **1. Readable transformations:** The threading macro (`->`) made complex data pipelines read like narratives. Each step tells a story, making the code understandable to both technical and even business stakeholders.
309
+
;; **1. Readable transformations:** Clojure's threading macro (`->`) made complex data pipelines read like narratives. Each step tells a story, making the code understandable to both technical and even business stakeholders.
306
310
307
311
^:kindly/hide-code
308
312
(kind/code
@@ -320,26 +324,25 @@ scatter-plot
320
324
321
325
;; ## The Impact
322
326
323
-
;; This project is still under construction and tangible business results yet to be seen. But:
327
+
;; This project is still under construction and tangible business results have yet to be seen. But:
324
328
325
329
;; - We are already stopping less effective cross-selling campaigns and starting target author communities
326
330
;; - Our website now features more data-driven "Customers Also Bought" recommendations
327
331
;; - We use these insights to optimize B2B offers for corporate clients
328
-
;; - Our social media campaigns are starting to be better targeted based on purchase pattern clusters
332
+
;; - Our social media campaigns are being better targeted based on purchase pattern clusters
329
333
330
334
;; Most importantly, I learned that you don't need a data science team or expensive tools to extract value from your data. With curiosity, the right tools, and a supportive community (shout out to the SciCloj folks on Zulip!), even a beginner can turn raw data into actionable insights.
331
335
332
336
;; ---
333
337
334
338
;; ## About the Author
335
339
336
-
;; **Tomáš Baránek** is a publisher at [Jan Melvil Publishing](https://www.melvil.cz) and co-founder of [Servantes](https://www.servant.es), developing software for publishers worldwide. He's a computer science graduate, Clojure enthusiast exploring data science, learning by doing on real publishing challenges.
340
+
;; **Tomáš Baránek** is a publisher at [Jan Melvil Publishing](https://www.melvil.cz) and co-founder of [Servantes](https://www.servant.es), developing software for publishers worldwide. He's a computer science graduate, Clojure enthusiast exploring data science, learning by doing on real publishing challenges. You can find him on [Bluesky](https://bsky.app/profile/tombarys.bsky.social) or read his [blog](https://lifehacky.net).
337
341
338
342
;; **Resources:**
339
343
;; - Author: https://barys.me
340
-
;; - Full presentation code: [github.com/tombarys/??](https://github.com/tombarys/?)
0 commit comments