You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/data_analysis/book_sales_analysis/about_apriori.clj
+18-16Lines changed: 18 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -44,12 +44,14 @@
44
44
45
45
;; The transformation from raw orders to an analysis-ready format was crucial. Using Tablecloth, the transformation pipeline was surprisingly readable:
46
46
47
-
^:kindly/hide-code ;; FIXME this is nonsense
47
+
;; ❗ FIXME this is too simplified, I have to change this ❗
48
+
49
+
^:kindly/hide-code
48
50
(kind/code
49
51
";; From customer orders with book lists...
50
52
(-> orders
51
53
(tc/group-by :zakaznik) ;; Group by customer
52
-
(tc/aggregate ;; Aggregate their purchases
54
+
(tc/aggregate ;; Aggregate their purchases
53
55
{:books #(distinct-books %)})
54
56
;; ...to binary matrix where each column is a book")
55
57
@@ -85,12 +87,9 @@
85
87
:xaxis {:tickangle45}
86
88
:margin {:l200:b50}
87
89
:width800:height600
88
-
:shapes [{:type"rect"
89
-
:x0-0.5:y0-0.5
90
-
:x180:y180
90
+
:shapes [{:type"rect":x0-0.5:y0-0.5:x180:y180
91
91
:line {:color"yellow":width3}}]
92
-
:annotations [{:x40:y80
93
-
:text"Recently published books show <br>much stronger co-purchase patterns"
92
+
:annotations [{:x40:y80:text"Recently published books show <br>much stronger co-purchase patterns"
;; Foreign bestsellers (marked in orange) showed consistently higher correlations with other books. They had broad appeal and were purchased alongside many different titles. Czech authors (in blue), however, showed lower correlations, suggesting their readers were more focused. Many customers would buy just one Czech title, often using it as a "gateway" into our catalog, while foreign bestsellers were part of larger, more diverse purchases.
147
+
;; Foreign bestsellers (marked in orange) showed consistently higher correlations with other books. They had broad appeal and were purchased alongside many different titles. Czech authors (in blue), however, showed lower correlations, suggesting their readers were more focused. Many customers would buy just one Czech title, often using it as a "gateway" into our catalog (strongly supported by Czech author's local campaigns), while foreign bestsellers were part of larger, more diverse purchases.
149
148
150
-
;; This insight immediately changed our marketing approach. We stopped using generic cross-sell recommendations for Czech authors and instead focused on building author-specific communities through social media campaigns.
149
+
;; This insight immediately changed our marketing approach. We stopped using generic cross-sell recommendations for Czech authors and instead focused on building author-specific communities and started to cooperate with easily reachable Czech authors on cross-selling approach.
151
150
152
151
;; ## The Limitation: Correlations Weren't Enough
153
152
@@ -245,6 +244,8 @@ scatter-plot
245
244
246
245
;; This visualization reveals clusters of books that customers buy together, forming natural "reading paths" through our catalog. The thickness of edges represents lift (stronger associations), while node darkness indicates support (popularity).
247
246
247
+
;; (Remember this data comes from part of the dataset with particular parameters and tresholds.)
248
+
248
249
;; ## From Analysis to Production
249
250
250
251
;; The final piece was building a prediction function that could recommend books based on a customer's purchase history:
@@ -276,13 +277,13 @@ scatter-plot
276
277
:min-confidence0.1)
277
278
8))
278
279
279
-
;; These recommendations are now powering a new "Customers Also Bought" section on our website, complementing our existing "Topically Similar" recommendations with data-driven insights.
280
+
;; These recommendations are now powering a new "Customers Also Bought" section on our website (even still in "manual" mode :), complementing our existing "Topically Similar" recommendations with data-driven insights.
280
281
281
282
;; ## Why This Matters for the Clojure Community
282
283
283
284
;; This project demonstrates several strengths of Clojure and the SciCloj ecosystem for real-world data science:
284
285
285
-
;; **1. Readable transformations:** The threading macro (`->`) made complex data pipelines read like narratives. Each step tells a story, making the code understandable to both technical and business stakeholders.
286
+
;; **1. Readable transformations:** The threading macro (`->`) made complex data pipelines read like narratives. Each step tells a story, making the code understandable to both technical and even business stakeholders.
286
287
287
288
^:kindly/hide-code
288
289
(kind/code
@@ -296,7 +297,7 @@ scatter-plot
296
297
297
298
;; **3. A complete stack:** From data manipulation (Tablecloth) to visualization (Tableplot) to presentation (Clay and Kindly), the SciCloj ecosystem provided everything I needed without leaving Clojure.
298
299
299
-
;; **4. Production-ready code:** The same code that powers my exploratory analysis can run in production, generating live recommendations for our website.
300
+
;; **4. Production-ready code:** The same code that powers my exploratory analysis can be run in production later, generating live recommendations for our website (I hope!).
300
301
301
302
;; ## The Impact
302
303
@@ -305,21 +306,22 @@ scatter-plot
305
306
;; - We are already stopping less effective cross-selling campaigns and starting target author communities
306
307
;; - Our website now features more data-driven "Customers Also Bought" recommendations
307
308
;; - We use these insights to optimize B2B offers for corporate clients
308
-
;; - Social media campaigns are more targeted based on purchase pattern clusters
309
+
;; - Our social media campaigns are starting to be better targeted based on purchase pattern clusters
309
310
310
311
;; Most importantly, I learned that you don't need a data science team or expensive tools to extract value from your data. With curiosity, the right tools, and a supportive community (shout out to the SciCloj folks on Zulip!), even a beginner can turn raw data into actionable insights.
311
312
312
313
;; ---
313
314
314
315
;; ## About the Author
315
316
316
-
;; **Tomáš Baránek** is a publisher at Jan Melvil Publishing and co-founder of Servantes, developing software for publishers worldwide. He's a computer science graduate exploring Clojure and data science, learning by doing on real publishing challenges.
317
+
;; **Tomáš Baránek** is a publisher at [Jan Melvil Publishing](https://www.melvil.cz) and co-founder of [Servantes](https://www.servant.es), developing software for publishers worldwide. He's a computer science graduate, Clojure enthusiast exploring data science, learning by doing on real publishing challenges.
317
318
318
319
;; **Resources:**
320
+
;; - Author: https://barys.me
319
321
;; - Full presentation code: [github.com/tombarys/??](https://github.com/tombarys/?)
0 commit comments