Skip to content

Commit 66ce38d

Browse files
committed
Working with WAV files for the DSP study group
1 parent 028d055 commit 66ce38d

3 files changed

Lines changed: 321 additions & 0 deletions

File tree

2.25 MB
Binary file not shown.

src/dsp/wav.png

15.7 KB
Loading

src/dsp/wav_files.clj

Lines changed: 321 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
^{:kindly/hide-code true
2+
:clay {:title "DSP Study Group - Reading audio data from WAV-files"
3+
:quarto {:author [:daslu :onbreath]
4+
:description "Exploring WAV-files for DSP in Clojure."
5+
:category :clojure
6+
:type :post
7+
:date "2025-11-09"
8+
:tags [:dsp :math :music]
9+
:image "wav.png"
10+
:draft true}}}
11+
(ns dsp.wav-files
12+
(:require [scicloj.kindly.v4.kind :as kind]
13+
[clojure.java.io :as io]
14+
[tech.v3.datatype.functional :as dfn]
15+
[tablecloth.api :as tc]
16+
[scicloj.tableplot.v1.plotly :as plotly])
17+
(:import (javax.sound.sampled AudioFileFormat
18+
AudioInputStream
19+
AudioSystem)
20+
(java.io InputStream)
21+
(java.nio ByteBuffer
22+
ByteOrder)))
23+
24+
;; **Exploration from the [Scicloj DSP Study Group](https://scicloj.github.io/docs/community/groups/dsp-study/)**
25+
;; *Second meeting - Nov. 08th 2025 and some follow-up investigation*
26+
27+
;; Welcome! These are notes from our second study group session, where
28+
;; we're learning digital signal processing together using
29+
;; Clojure. We're following the excellent book
30+
;; [**Think DSP** by Allen B. Downey](https://greenteapress.com/wp/think-dsp/) (available free online).
31+
;;
32+
;; **Huge thanks to Professor Downey** for writing such an accessible and free introduction to DSP, and for sharing with us the work-in-progress notebooks of [Think DSP 2](https://allendowney.github.io/ThinkDSP2/index.html).
33+
34+
;; Along with this study group came the idea to have an online
35+
;; creative coding festival around Clojure in the first months of
36+
;; 2026. In this meeting we spent some time brainstorming on how that
37+
;; might look and what the scope could be. The remaining time of the
38+
;; session we looked into downloading and reading WAV-files in
39+
;; Clojure.
40+
41+
;; ## Why WAV Files?
42+
;;
43+
;; The notebooks in Think DSP 2 work with WAV files loaded from GitHub
44+
;; as a basis for further processing, so we need a way to load these
45+
;; as well. After obtaining the file, we need to get at the audio data
46+
;; it contains.
47+
48+
;; ## Simplified WAV Format
49+
50+
;; First, let's take a superficial look at what data WAV files
51+
;; contain, before we dive into getting the data. A simple WAV file
52+
;; consists of a header and pure audio data following it. There are
53+
;; several iterations on specifications for the WAV format and the
54+
;; format allows for quite some flexibility in placing different
55+
;; metadata in the file, as well as different encodings.
56+
57+
^:kindly/hide-code
58+
(kind/mermaid
59+
"---
60+
config:
61+
theme: 'forest'
62+
---
63+
64+
block
65+
columns 1
66+
block:wav
67+
columns 5
68+
block:HeaderId
69+
columns 1
70+
HeaderLabel[\"Header\"]
71+
end
72+
73+
block:F1
74+
columns 1
75+
FrameLabel1[\"Frame\"]
76+
end
77+
78+
block:F2
79+
columns 1
80+
FrameLabel2[\"Frame\"]
81+
end
82+
83+
block:F3
84+
columns 1
85+
FrameLabel3[\"Frame\"]
86+
end
87+
88+
block:FN
89+
columns 1
90+
FrameLabelN[\"...\"]
91+
end
92+
end")
93+
94+
95+
;; The WAV (Waveform Audio File Format) file format is a
96+
;; RIFF (Resource Interchange File Format) file which stores data in
97+
;; **chunks**. Each **chunk** consists of a **tag** and **data**. Lets
98+
;; consider a partial example, which corresponds to the way the WAV
99+
;; file we want to read is arranged:
100+
101+
^:kindly/hide-code
102+
(kind/mermaid
103+
"---
104+
config:
105+
theme: 'forest'
106+
---
107+
108+
block
109+
columns 1
110+
block:wav
111+
columns 3
112+
block:HeaderId
113+
columns 1
114+
HeaderLine1[\"RIFF\"]
115+
HeaderLine2[\"WAVE\"]
116+
end
117+
118+
block:HeaderId2
119+
columns 1
120+
HeaderLine3[\"fmt \"]
121+
HeaderLine4[\"1\"]
122+
HeaderLine5[\"44100\"]
123+
HeaderLine6[\"16\"]
124+
end
125+
126+
block:data
127+
columns 1
128+
DataLabel[\"data\"]
129+
ChanF1[\"ch0\"]
130+
ChanF2[\"ch0\"]
131+
ChanF2[\"ch0\"]
132+
ChanF3[\"ch0\"]
133+
ChanFN[\"...\"]
134+
end
135+
end")
136+
137+
;; The header comprises of the **tag** `RIFF`, its **chunk** tagged
138+
;; with the specific format `WAVE` and a **subchunk** `fmt `, which
139+
;; describes the contained audio data. This represents some of the
140+
;; header information in a WAV file with a single, 16-bit mono sound
141+
;; channel and 44.100 samples per second.
142+
143+
;; As we learned in the [first session](https://clojurecivitas.github.io/dsp/intro.html)
144+
;; of the DSP study group:
145+
;; > Sound waves are continuous vibrations in the air. To work with them on a computer,
146+
;; > we need to **sample** them - take measurements at regular intervals. The **sample rate**
147+
;; > tells us how many measurements per second. CD-quality audio uses 44,100 samples per second.
148+
149+
;; These **samples** are stored in the WAV files `data` tagged
150+
;; **subchunk**. Since this is mono sound, there is one **frame** with
151+
;; one **channel** per **sample**. For multiple **channels**, each
152+
;; **frame** consists of all channels and their respective **sample**.
153+
154+
;; ## Libraries We're Using
155+
;;
156+
;; - **[Kindly](https://scicloj.github.io/kindly-noted/kindly)** - Visualization protocol that renders our data as interactive HTML elements (through Clay)
157+
;; - **[Kindly](https://scicloj.github.io/kindly-noted/kindly)** - Visualization protocol that renders our data as interactive HTML elements (through Clay)
158+
;; - **[dtype-next](https://github.com/cnuernber/dtype-next)** - Efficient numerical arrays and vectorized operations (like NumPy for Clojure)
159+
;; - **[Tablecloth](https://scicloj.github.io/tablecloth/)** - DataFrame library for data manipulation and transformation
160+
;; - **[Tableplot](https://scicloj.github.io/tableplot/)** - Declarative plotting library built on Plotly
161+
;; - **[javax.sound.sampled](https://docs.oracle.com/en/java/javase/25/docs/api/java.desktop/javax/sound/sampled/package-summary.html)** - Some classes from the Java standard libraries sound package to read WAV Files.
162+
163+
(require '[scicloj.kindly.v4.kind :as kind]
164+
'[clojure.java.io :as io])
165+
^:kindly/hide-code
166+
(kind/code
167+
"(import '(javax.sound.sampled AudioFileFormat
168+
AudioInputStream
169+
AudioSystem)
170+
'(java.io InputStream)
171+
'(java.nio ByteBuffer
172+
ByteOrder))")
173+
174+
175+
;; ## Downloading a WAV File
176+
(defn copy [uri file]
177+
(with-open [in (io/input-stream uri)
178+
out (io/output-stream file)]
179+
(io/copy in out)))
180+
181+
^:kindly/hide-code
182+
(def tuning-fork-file
183+
"18871__zippi1__sound-bell-440hz.wav")
184+
185+
^:kindly/hide-code
186+
(def tuning-fork-url
187+
(str "https://github.com/AllenDowney/ThinkDSP/raw/master/code/" tuning-fork-file))
188+
189+
^:kindly/hide-code
190+
(def tuning-fork-file
191+
"18871__zippi1__sound-bell-440hz.wav")
192+
193+
^:kindly/hide-code
194+
(def tuning-fork-file-compressed
195+
"18871__zippi1__sound-bell-440hz-compressed.wav")
196+
197+
^:kindly/hide-code
198+
(def tuning-fork-path
199+
(str "src/dsp/" tuning-fork-file))
200+
201+
^:kindly/hide-code
202+
(def tuning-fork-path-compressed
203+
(str "src/dsp/" tuning-fork-file-compressed))
204+
205+
(copy tuning-fork-url tuning-fork-path)
206+
207+
;; ## Playing a WAV File
208+
;;
209+
;; Kindly can embed a player with a URL, but the sample is extremely
210+
;; loud (it is a tuning fork struck in front of a microphone), so we
211+
;; don't embed this player.
212+
^:kindly/hide-code
213+
(kind/code "(kind/audio {:src tuning-fork-url})")
214+
215+
;; Here we use a compressed and loudness normalized version of the
216+
;; original file, so you can safely listen to it.
217+
(kind/audio {:src tuning-fork-file-compressed})
218+
219+
;; ## Reading Metadata from the WAV File
220+
;;
221+
;; We define a function to collect some metadata from the file.
222+
(defn audio-format [^InputStream is]
223+
(let [file-format (AudioSystem/getAudioFileFormat is)
224+
format (.getFormat file-format)]
225+
{:is-big-endian? (.isBigEndian format)
226+
:channels (.getChannels format)
227+
:sample-rate (.getSampleRate format)
228+
:sample-size-bits (.getSampleSizeInBits format)
229+
:frame-length (.getFrameLength file-format)
230+
:encoding (str (.getEncoding format))}))
231+
232+
(with-open [wav-stream (io/input-stream tuning-fork-path)]
233+
(def wav-format
234+
(audio-format wav-stream)))
235+
236+
wav-format
237+
238+
;; `:is-big-endian?` specifies the byte order of audio data with more
239+
;; than 8 `:sample-size-bits`. `:sample-size-bits` is the number of
240+
;; bits comprising a sample. The `:frame-length` is the total amount
241+
;; of frames contained in the audio data.
242+
243+
;; We don't use much of that information for now, but it'll let us
244+
;; peek at what kind of WAV file we're working with in the future and
245+
;; we can use the information to extend our function for extracting
246+
;; audio data, which we define next.
247+
248+
;; ## Reading Audio Data from the WAV File
249+
;;
250+
;; The bulk of work here is handled by the ``AudionInputStream``, but
251+
;; since it only reads bytes for us, we have to put these together
252+
;; into the correct datatype for each frame manually. For now we just
253+
;; put the data for 16-bit mono WAV files into a short-array.
254+
(defn audio-data [^InputStream is]
255+
(let [{:keys [frame-length]} (audio-format is)
256+
format (-> (AudioSystem/getAudioFileFormat is)
257+
AudioFileFormat/.getFormat)
258+
^bytes audio-bytes (with-open [ais (AudioInputStream. is format frame-length)]
259+
(AudioInputStream/.readAllBytes ais))
260+
audio-shorts (short-array frame-length)
261+
bb (ByteBuffer/allocate 2)]
262+
(dotimes [i frame-length]
263+
(ByteBuffer/.clear bb)
264+
(.order bb ByteOrder/LITTLE_ENDIAN)
265+
(.put bb ^byte (aget audio-bytes (* 2 i)))
266+
(.put bb ^byte (aget audio-bytes (inc (* 2 i))))
267+
(aset-short audio-shorts i (.getShort bb 0)))
268+
audio-shorts))
269+
270+
(with-open [wav-stream (io/input-stream tuning-fork-path)]
271+
(def wav-shorts
272+
(audio-data wav-stream)))
273+
274+
;; The difference between the WAV file bytes and the audio data we
275+
;; read is 44 bytes, which is the size of the default header and
276+
;; container.
277+
(with-open [wav-stream (io/input-stream tuning-fork-path)]
278+
(- (count (.readAllBytes wav-stream))
279+
(* 2 (count wav-shorts))))
280+
281+
;; ## Striking the Fork
282+
;;
283+
;; Now that we have read the data we can reduce its amplitude, so we
284+
;; can listen to it safely.
285+
^kind/audio
286+
{:samples (dfn// wav-shorts 4000000.0)
287+
:sample-rate (:sample-rate wav-format)}
288+
289+
;; In fact, the function `audio-data` above is quite similar to how [Clay](https://github.com/scicloj/clay/blob/main/src/scicloj/clay/v2/item.clj#L420) writes the audio data to a file for us to listen to in the browser, just the reverse of what we did for reading.
290+
291+
;; ## Visualizing Waves
292+
;;
293+
;; Let's take a look at the sound of a tuning fork.
294+
(let [{:keys [frame-length sample-rate]} wav-format]
295+
(-> {:time (dfn// (range frame-length)
296+
sample-rate)
297+
:value wav-shorts}
298+
tc/dataset
299+
(plotly/layer-line {:=x :time
300+
:=y :value})))
301+
302+
;; ## What we learned
303+
;;
304+
;; In the second session and some pairing beyond we prepared for our
305+
;; forthcoming sessions on Think DSP by:
306+
;; - **WAV file format** - Learning about the structure of simple WAV files
307+
;; - **File download** - Downloading files with Java
308+
;; - **WAV file metadata** - Reading metadata of a WAV file
309+
;; - **WAV file audio data** - Reading the bytes in the audio data container and converting them to an appropriate data type
310+
;;
311+
;; ## Next Steps
312+
;;
313+
;; In our next study group meetings, we'll explore the book step by step, and learn more about sounds and signals,
314+
;; harmonics and the Forier transform, non-periodic signals and spectograms, noise and filtering, and more.
315+
;;
316+
;; Join us at the [Scicloj DSP Study Group](https://scicloj.github.io/docs/community/groups/dsp-study/)!
317+
;;
318+
;; ---
319+
;;
320+
;; *Again, huge thanks to Allen B. Downey for Think DSP. If you find this resource valuable,
321+
;; consider [supporting his work](https://greenteapress.com/wp/) or sharing it with others.*

0 commit comments

Comments
 (0)