Skip to content

Commit e467a6c

Browse files
committed
Added image and made last proofreading
1 parent d08aa7f commit e467a6c

2 files changed

Lines changed: 17 additions & 14 deletions

File tree

src/data_analysis/book_sales_analysis/about_apriori.clj

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@
2626

2727
;; When you run an indie publishing house with over 160 titles and sell thousands of books each month, one question keeps coming back: **Which books do our customers buy together?** This seemingly simple question led me down a fascinating path from basic correlation analysis to building a more robust recommendation system using association rule mining—all with Clojure and the SciCloj ecosystem.
2828

29+
^:kindly/hide-code
30+
(kind/image
31+
{:src "books.png"
32+
:width "100%"
33+
:max-width "800px"})
2934

3035
;; ## The Starting Point: Understanding Our Data
3136

@@ -43,7 +48,7 @@
4348

4449
;; *(for clarity, many columns were omitted here; rows were generated with `(tc/random 5)` from anonymized dataset)*
4550

46-
;; Each row represented one order, with books listed as comma-separated values. There are many exceptions, incosistencies and format-based differences ^[^1]. To analyze purchasing patterns, I needed to transform this into a format where each book became a binary feature: did a customer buy it (1) or not (0)? This transformation is called **one-hot encoding**.
51+
;; Each row represented one order, with books listed as comma-separated values. There are many exceptions, inconsistencies, and format-based differences ^[^1]. To analyze purchasing patterns, I needed to transform this into a format where each book became a binary feature: did a customer buy it (1) or not (0)? This transformation is called **one-hot encoding**.
4752
;;
4853
;; ---
4954
;;
@@ -79,7 +84,7 @@
7984
(tc/reorder-columns [:zakaznik])
8085
(tc/random 5)
8186
(tc/head 4)
82-
(tc/select-columns #"^:zakaznik|^:book-00.*"))) ;; just quick reduction of table width
87+
(tc/select-columns #"^:zakaznik|^:book-00.*")))
8388

8489
^:kindly/hide-code
8590
(kind/table onehot-sample)
@@ -149,18 +154,17 @@
149154
:zeroline true}
150155
:yaxis {:title "Skewness <br>(more positive → less generic relation)"
151156
:zeroline true}
152-
:hovermode "closest"
153-
:showlegend true
154-
:height 650
155-
:width 750
157+
:hovermode "closest" :showlegend true
158+
:height 650 :width 750
156159
:margin {:l 65 :r 50 :b 65 :t 90}
157160
:legend {:x 1 :y 1 :xanchor "right"}}}))
158161

162+
^:kindly/hide-code
159163
scatter-plot
160164

161165
;; Foreign bestsellers (marked in orange) showed consistently higher correlations with other books. They had broad appeal and were purchased alongside many different titles. Czech authors (in blue), however, showed lower correlations, suggesting their readers were more focused. Many customers would buy just one Czech title, often using it as a "gateway" into our catalog (strongly supported by Czech author's local campaigns), while foreign bestsellers were part of larger, more diverse purchases.
162166

163-
;; This insight immediately changed our marketing approach. We stopped using generic cross-sell recommendations for Czech authors and instead focused on building author-specific communities and started to cooperate with easily reachable Czech authors on cross-selling approach.
167+
;; This insight immediately changed our marketing approach. We stopped using generic cross-sell recommendations for Czech authors and instead focused on building author-specific communities and started to cooperate with easily reachable local Czech authors on cross-selling approach.
164168

165169
;; ## The Limitation: Correlations Weren't Enough
166170

@@ -296,13 +300,13 @@ scatter-plot
296300
:min-confidence 0.1)
297301
8))
298302

299-
;; These recommendations are now powering a new "Customers Also Bought" section on our website (even still in "manual" mode :), complementing our existing "Topically Similar" recommendations with data-driven insights.
303+
;; These recommendations are now powering a new "Customers Also Bought" section on our website (still in "manual" mode for now), complementing our existing "Topically Similar" recommendations with data-driven insights.
300304

301305
;; ## Why This Matters for the Clojure Community
302306

303307
;; This project demonstrates several strengths of Clojure and the SciCloj ecosystem for real-world data science:
304308

305-
;; **1. Readable transformations:** The threading macro (`->`) made complex data pipelines read like narratives. Each step tells a story, making the code understandable to both technical and even business stakeholders.
309+
;; **1. Readable transformations:** Clojure's threading macro (`->`) made complex data pipelines read like narratives. Each step tells a story, making the code understandable to both technical and even business stakeholders.
306310

307311
^:kindly/hide-code
308312
(kind/code
@@ -320,26 +324,25 @@ scatter-plot
320324

321325
;; ## The Impact
322326

323-
;; This project is still under construction and tangible business results yet to be seen. But:
327+
;; This project is still under construction and tangible business results have yet to be seen. But:
324328

325329
;; - We are already stopping less effective cross-selling campaigns and starting target author communities
326330
;; - Our website now features more data-driven "Customers Also Bought" recommendations
327331
;; - We use these insights to optimize B2B offers for corporate clients
328-
;; - Our social media campaigns are starting to be better targeted based on purchase pattern clusters
332+
;; - Our social media campaigns are being better targeted based on purchase pattern clusters
329333

330334
;; Most importantly, I learned that you don't need a data science team or expensive tools to extract value from your data. With curiosity, the right tools, and a supportive community (shout out to the SciCloj folks on Zulip!), even a beginner can turn raw data into actionable insights.
331335

332336
;; ---
333337

334338
;; ## About the Author
335339

336-
;; **Tomáš Baránek** is a publisher at [Jan Melvil Publishing](https://www.melvil.cz) and co-founder of [Servantes](https://www.servant.es), developing software for publishers worldwide. He's a computer science graduate, Clojure enthusiast exploring data science, learning by doing on real publishing challenges.
340+
;; **Tomáš Baránek** is a publisher at [Jan Melvil Publishing](https://www.melvil.cz) and co-founder of [Servantes](https://www.servant.es), developing software for publishers worldwide. He's a computer science graduate, Clojure enthusiast exploring data science, learning by doing on real publishing challenges. You can find him on [Bluesky](https://bsky.app/profile/tombarys.bsky.social) or read his [blog](https://lifehacky.net).
337341

338342
;; **Resources:**
339343
;; - Author: https://barys.me
340-
;; - Full presentation code: [github.com/tombarys/??](https://github.com/tombarys/?)
344+
;; - Full presentation code: *to be published*
341345
;; - SciCloj community: [scicloj.github.io](https://scicloj.github.io)
342-
;; - Connect: tom@barys.me
343346

344347
;; ---
345348

442 KB
Loading

0 commit comments

Comments
 (0)