Skip to content

Commit 585529e

Browse files
committed
piton
1 parent bad49ec commit 585529e

2 files changed

Lines changed: 49 additions & 62 deletions

File tree

src/data_visualization/violin.clj

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
[hyperphor.multitool.core :as mu]
2121
))
2222

23+
2324
;; # The World's Smallest Violin (plot generating code)
2425

2526

@@ -44,16 +45,17 @@
4445
io/file
4546
(ImageIO/read))
4647

47-
;;; If you want to see a full-fledged implementation of interactive violin plots for visualizaing biological data, [the BRUCE website](https://bruce.parkerici.org) has one.
48+
49+
;;; Violin plots are common in the scientific literature. For an example of using violin plots in a scientific domain, see the [BRUCE website](https://bruce.parkerici.org), which uses interactive violin plots to visualize data from a brain cancer research project.
4850

4951
;;; ## References
5052

5153
;;; - [Violin Plots: A Box Plot-Density Trace Synergism](https://web.archive.org/web/20231106021405/https://quantixed.org/wp-content/uploads/2014/12/hintze_1998.pdf) Jerry L. Hintze, Ray D. Nelson
52-
54+
;;; - [Violin Plot - Wikipedia](https://en.wikipedia.org/wiki/Violin_plot)
5355

5456
;;; # Data
5557

56-
;;; We'll use this classic [dataset about penguin morphology](https://github.com/ttimbers/palmerpenguins/blob/master/README.md). <img src='man/figures/logo.png' align="right" height="138.5" /></a>. Each row describes an individual penguin, with properties like species, sex, body mass, wing size.
58+
;;; We'll use this classic [dataset about penguin morphology](https://github.com/ttimbers/palmerpenguins/blob/master/README.md). Each row in this dataset describes an individual penguin, with properties like species, sex, body mass, wing size.
5759

5860
(def penguin-data-url
5961
"https://raw.githubusercontent.com/ttimbers/palmerpenguins/refs/heads/file-variants/inst/extdata/penguins.tsv")
@@ -64,7 +66,7 @@
6466
(kind/table
6567
(tc/random penguin-data 10))
6668

67-
;;; # Just the points, ma'am
69+
;;; # Just show me the datapoints
6870

6971
;;; Let's start off with a simple dot-plot. We'll group the data by `species`, and turn each value for `body_mass` into a point.
7072
;;; Vega just requires specifying some basic mappings (aka encodings) between data fields and visual properties. So a minimal dot plot can look like this:
@@ -79,7 +81,8 @@
7981
:type :nominal}}
8082
}
8183

82-
;;; Vega's defaults are not always what we want, so this is the same as above with a bit of tweaking to look more like what we want. One nonobvious change: we use `:row` in place of `:y`. This is not estrictly necessary at this point, but will make it easier when we get to actual violin plots. Also, we add some randomness to (jitter) so we can better see individual points, and just for the hell of it, map another attribute (`sex`) to `:shape`.
84+
;;; Vega's defaults are not always what we want, so the next version has the same as structure as before, with a bit of tweaking to look more like what we want. We'll adjust the size of the graph, adjust the scale, use color.
85+
;;;Wwe add some randomness (jitter) so we can better see individual points, and just for the hell of it, map another attribute, `sex`, to `:shape`.
8386

8487
^:kind/vega-lite
8588
{:mark {:type "point" :tooltip {:content :data}}
@@ -105,6 +108,8 @@
105108
:width 800
106109
}
107110

111+
;;; One nonobvious change: we use `:row` in place of `:y` for the group (species) dimension. This is not estrictly necessary at this point, but will make it easier when we get to actual violin plots. Just to be more confusing, we reuse the `:y` encoding for the random jitter.
112+
108113

109114
;;; # Boxplot
110115

@@ -164,5 +169,5 @@
164169
:width 800
165170
})
166171

167-
;;; That's the basics of a violin plot! In the followup page, we'll see about abstracting some of this into functions, with some variations. and we'll look at combining violin plots with dot and box plots for a richer of our data.
172+
;;; That's the basics of a violin plot! In the [followup page](violin2), we'll see about abstracting some of this into functions, with some variations. and we'll look at combining violin plots with dot and box plots for a richer of our data.
168173

src/data_visualization/violin2.clj

Lines changed: 38 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,17 @@
2121
))
2222

2323

24-
;;; # Data (repeated from first part) TODO hide
24+
;; # The World's Smallest Violin (plot generating code), Part 2
25+
26+
27+
;;; [Link to Part 1]()
28+
29+
30+
^:kindly/hide-code ^:kind/hidden
2531
(def penguin-data-url
2632
"https://raw.githubusercontent.com/ttimbers/palmerpenguins/refs/heads/file-variants/inst/extdata/penguins.tsv")
2733

34+
^:kindly/hide-code ^:kind/hidden
2835
(def penguin-data
2936
(tc/dataset penguin-data-url {:key-fn keyword}))
3037

@@ -36,12 +43,14 @@
3643
;;; Once we know how to make a visualization, it makes sense to abstract it into a procedure, so this knowledge in a function.
3744

3845

39-
;;; For inst
46+
;;; So lets do that for dot plot. We'll make a function that takes three required objects: `data`, `value-field`, and `group-field`, along with some options.
4047

4148
(defn dot-plot
42-
[data value-field group-field]
49+
[data value-field group-field
50+
& {:keys [jitter? color-field]}]
4351
{:mark {:type "point" :tooltip {:content :data}}
4452
:data data
53+
:transform [{:calculate "random()" :as "jitter"}]
4554
:encoding
4655
{:x {:field value-field
4756
:type :quantitative
@@ -50,82 +59,55 @@
5059
:type :nominal
5160
:header {:labelAngle 0 :labelAlign "left"}
5261
:spacing 0}
53-
:color {:field group-field
62+
:y {:field (if jitter? "jitter" nil)
63+
:type :quantitative
64+
:axis false}
65+
:color {:field (or color-field group-field)
5466
:type :nominal
55-
:legend false}
56-
}
67+
:legend (if color-field true false)}}
5768
:height 50
5869
:width 800
5970
})
6071

61-
;;; Which can be used like this:
62-
72+
;;; Which can be used like this (here we'll look some differen attributes)
6373

6474
^:kind/vega-lite
6575
(dot-plot {:values (tc/rows penguin-data :as-maps)}
66-
"flipper_length_mm" "year"
76+
"body_mass_g" "sex"
77+
:jitter? true
78+
:color-field "species island"
6779
)
6880

6981

70-
;;; On any data set
82+
;;; And we can easily reuse the function on a different data set (this one is about movies)
83+
7184
^:kind/vega-lite
7285
(dot-plot {:url "https://vega.github.io/editor/data/movies.json"}
73-
"US Gross" "Major Genre")
86+
"US Gross" "Major Genre"
87+
:jitter? true)
7488

7589

7690

77-
;;; Add jitter
78-
79-
(defn dot-plot-2
80-
[data value-field group-field jitter?]
81-
{:mark {:type "point" :tooltip {:content :data}}
82-
:data data
83-
:transform (if jitter? [{:calculate "random()" :as "jitter"}] [])
84-
:encoding
85-
{:x {:field value-field
86-
:type :quantitative
87-
:scale {:zero false}}
88-
:y (when jitter?
89-
{:field "jitter"
90-
:type :quantitative
91-
:axis false})
92-
:row {:field group-field
93-
:type :nominal
94-
:header {:labelAngle 0 :labelAlign "left"}
95-
:spacing 0}
96-
:color {:field group-field
97-
:type :nominal
98-
:legend false}
99-
}
100-
:height 50
101-
:width 800
102-
})
103-
104-
105-
;;; On any data set
106-
^:kind/vega-lite
107-
(dot-plot-2 {:url "https://vega.github.io/editor/data/movies.json"}
108-
"US Gross" "Major Genre" true)
109-
110-
11191
;;; # Generalize
11292

113-
;;; This section introduces a new, and somewhat funky way of using and generalizing Vega specs.
114-
115-
;;; Take our dot-plot abstraction above. We could parameterize it further, say :type which could be :dotplot or :boxplot. But instead, we're going to hack it by introducing a function that can merge arbitrarily nested structures. This means we can alter any aspect of the spec, at the cost of having to have some knowledge of its structure. Eg we could change the height or spacing or fonts.
93+
;;; This section introduces a new, and somewhat funky way of generalizing Vega specs.
11694

95+
;;; Take our dot-plot abstraction above. We could parameterize it further, eg by adding optional arguments for height or scale or any of the many things Vega allows you to tweak.
11796

118-
;; TODO maybe more confuscing than it is worth here. For a later section?
97+
;; But instead, we're going to introduce a much more general (if somewhat unclean) way of modifying a base Vega spec – through structural merge. This makes use of a function `mu/merge-recursive` from the (Multitool utility library)[https://github.com/hyperphor/multitool/blob/9e10c6b9cfe7f1deb496e842fc12505748a09d69/src/cljc/hyperphor/multitool/core.cljc#L1012]. This function m merges arbitrarily nested structures. This means we can alter any aspect of the spec, at the cost of having to have some knowledge of its structure.
11998

99+
(defn dot-plot-g
100+
[data value-field group-field overrides]
101+
(mu/merge-recursive
102+
(dot-plot data value-field group-field false)
103+
overrides))
120104

121-
;; mu/merge-recursive is a function from the Multitool utility library [link].
122105

123-
(defn box-plot
124-
[data value-field group-field]
125-
(-> (dot-plot-2 data value-field group-field false)
126-
(mu/merge-recursive
127-
{:mark {:type :boxplot
128-
:extent :min-max}})))
106+
^:kind/vega-lite
107+
(dot-plot-g {:values (tc/rows penguin-data :as-maps)}
108+
"body_mass_g" "sex"
109+
{:mark {:filled true}}
110+
)
129111

130112

131113

0 commit comments

Comments
 (0)