Skip to content

Commit e83fc4d

Browse files
committed
fft comparison wip
1 parent 49dd32e commit e83fc4d

1 file changed

Lines changed: 31 additions & 26 deletions

File tree

src/dsp/fft_comparison.clj

Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@
2222
[com.github.psambit9791.jdsp.transform FastFourier]
2323
[org.jtransforms.fft DoubleFFT_1D]))
2424

25+
^{:kindly/hide-code true
26+
:kindly/kind :kind/hidden}
2527
(set! *warn-on-reflection* true)
2628

2729
;; # Introduction
@@ -46,8 +48,8 @@
4648
;; **Key features:**
4749
;; - Part of Apache Commons (widely used, stable, maintained)
4850
;; - Supports multiple normalization conventions
49-
;; - 1D FFT, [Hadamard](https://en.wikipedia.org/wiki/Hadamard_transform), and sine/cosine transforms
50-
;; - Requires input length to be a [power of 2](https://en.wikipedia.org/wiki/Power_of_two)
51+
;; - 1D FFT, [DCT](https://en.wikipedia.org/wiki/Discrete_cosine_transform)/[DST](https://en.wikipedia.org/wiki/Discrete_sine_transform), [Hadamard](https://en.wikipedia.org/wiki/Hadamard_transform) transforms
52+
;; - Requires signal length to be a [power of 2](https://en.wikipedia.org/wiki/Power_of_two)
5153
;; - Returns [`Complex[]`](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/complex/Complex.html) arrays (allocation overhead)
5254

5355
;; ### JDSP
@@ -121,8 +123,7 @@
121123
:=title "Test Signal: 5 Hz + 12 Hz"
122124
:=width 700
123125
:=height 300})
124-
(plotly/layer-line {:=mark-color "steelblue"})
125-
plotly/plot)
126+
(plotly/layer-line {:=mark-color "steelblue"}))
126127

127128
;; ## Visualization Helper
128129

@@ -149,8 +150,7 @@
149150
:=title title
150151
:=width 700
151152
:=height 300})
152-
(plotly/layer-line {:=mark-color color})
153-
plotly/plot)))
153+
(plotly/layer-line {:=mark-color color}))))
154154

155155
;; ## FFT Implementation #1: Apache Commons Math
156156

@@ -314,8 +314,8 @@
314314
(map (fn [{:keys [library size mean-ms lower-q-ms upper-q-ms]}]
315315
{:library library
316316
:size (if (= size 128) "128 (2^7)" "131K (2^17)")
317-
:mean-time (format "%.3f ms" mean-ms)
318-
:range (format "%.3f - %.3f ms" lower-q-ms upper-q-ms)})
317+
:mean-time (format "%.6f ms" mean-ms)
318+
:range (format "%.6f - %.6f ms" lower-q-ms upper-q-ms)})
319319
library-benchmarks))
320320

321321
;; ## Exploring Parallelization: An Experiment
@@ -324,18 +324,22 @@
324324
;;
325325
;; **Disclaimer:** This section documents an exploratory experiment with uncertain conclusions. We're learning as we go, and the results are puzzling!
326326

327-
;; ### How JTransforms Works: Two Algorithms
327+
;; ### How JTransforms Works: Three Algorithm Plans
328328

329-
;; Looking at the [JTransforms source code](https://github.com/wendykierp/JTransforms/blob/master/src/main/java/org/jtransforms/fft/DoubleFFT_1D.java), we discovered it uses **two different FFT algorithms**:
329+
;; Looking at the [JTransforms source code](https://github.com/wendykierp/JTransforms/blob/master/src/main/java/org/jtransforms/fft/DoubleFFT_1D.java), we discovered it uses **three different execution plans**:
330330
;;
331-
;; 1. **[Cooley-Tukey FFT](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm)** - Fast, parallelizable, but requires power-of-2 input sizes
332-
;; 2. **[Bluestein's algorithm](https://en.wikipedia.org/wiki/Chirp_Z-transform#Bluestein's_algorithm)** (`bluestein_complex` in the code) - Works with any size, likely slower and possibly single-threaded
331+
;; 1. **SPLIT_RADIX** ([Cooley-Tukey variant](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm)) - Used for power-of-2 sizes, supports multithreading
332+
;; 2. **MIXED_RADIX** (Cooley-Tukey variant) - Used for factorable sizes (divisible by 2, 3, 4, 5), primarily sequential
333+
;; 3. **BLUESTEIN** ([Chirp Z-transform](https://en.wikipedia.org/wiki/Chirp_Z-transform#Bluestein's_algorithm)) - Used for arbitrary sizes with large prime factors, supports 1-4 threads
333334
;;
334-
;; When you provide a power-of-2 size, JTransforms uses Cooley-Tukey (optimal). For other sizes, it automatically falls back to Bluestein.
335+
;; The plan is automatically selected in the constructor based on signal size:
336+
;; - Power-of-2 → SPLIT_RADIX (fastest, our benchmarks use this)
337+
;; - Factorable by small primes → MIXED_RADIX
338+
;; - Large prime factors (remainder ≥ 211) → BLUESTEIN
335339
;;
336-
;; This explains why our earlier tests with sizes like 100, 127, 1000 worked—they used Bluestein!
340+
;; This explains why arbitrary sizes like 100, 127, 1000 work—they use MIXED_RADIX or BLUESTEIN!
337341
;;
338-
;; The [JTransforms paper (Wendykier & Grote 2012)](https://www.math.emory.edu/technical-reports/techrep-00127.pdf) discusses the Cooley-Tukey implementation,
342+
;; The [JTransforms paper (Wendykier & Nagy 2008)](https://www.math.emory.edu/technical-reports/techrep-00127.pdf) discusses these implementations,
339343
;; noting that 1D transforms can use at most 2 or 4 threads due to the algorithm's structure.
340344

341345
;; ### Testing Thread Performance
@@ -383,14 +387,14 @@
383387
(plotly/base {:=x :threads
384388
:=y :mean-ms
385389
:=color :size
390+
:=color-type :nominal
386391
:=title "FFT Performance vs Thread Count (fastmath/JTransforms)"
387392
:=x-title "Number of Threads"
388393
:=y-title "Mean Time per FFT (ms)"
389394
:=width 800
390395
:=height 500})
391396
(plotly/layer-point {:=mark-size 10})
392-
(plotly/layer-line {:=mark-opacity 0.6})
393-
plotly/plot)
397+
(plotly/layer-line {:=mark-opacity 0.6}))
394398

395399
;; ### What We Found (Spoiler: It's Confusing!)
396400

@@ -459,35 +463,36 @@
459463
;; - **Heap**: 15,936 MB max
460464
;; - **Clojure**: 1.12.3
461465

462-
;; ## Input Size Requirements
466+
;; ## Signal Size Requirements
463467

464-
;; Different libraries have different requirements for input signal lengths:
468+
;; Different libraries have different requirements for signal lengths:
465469

466470
;; ### Apache Commons Math: Power-of-2 Required
467471
;;
468472
;; Apache Commons Math **strictly requires** signal length to be a [power of 2](https://en.wikipedia.org/wiki/Power_of_two) (128, 256, 512, 1024, etc.). This is because it only implements the classic [Cooley-Tukey algorithm](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm).
469473

470474
;; ### JTransforms/fastmath: Flexible (with caveats)
471475
;;
472-
;; Based on our source code investigation (see the threading section above), JTransforms uses **two different algorithms**:
476+
;; Based on our source code investigation (see the threading section above), JTransforms uses **three execution plans**:
473477
;;
474-
;; 1. **Cooley-Tukey** - Fast, parallelizable, requires power-of-2
475-
;; 2. **Bluestein's algorithm** - Works with arbitrary lengths
478+
;; 1. **SPLIT_RADIX** (Cooley-Tukey variant) - Fast, multithreaded, requires power-of-2
479+
;; 2. **MIXED_RADIX** (Cooley-Tukey variant) - For factorable sizes, primarily sequential
480+
;; 3. **BLUESTEIN** (Chirp Z-transform) - For arbitrary sizes with large prime factors, supports 1-4 threads
476481
;;
477-
;; When you provide a non-power-of-2 size, JTransforms automatically falls back to Bluestein's algorithm. This means:
482+
;; When you provide a non-power-of-2 size, JTransforms uses MIXED_RADIX or BLUESTEIN depending on the size's prime factorization. This means:
478483
;; - ✅ It works with any size (we tested 100, 127, 1000 successfully)
479484
;; - ⚠️ Performance may be slower for non-power-of-2 sizes
480-
;; - ⚠️ Parallelization may not work with Bluestein (unclear from our experiments)
485+
;; - ⚠️ MIXED_RADIX is primarily sequential; BLUESTEIN supports limited parallelization
481486
;;
482-
;; **Recommendation:** Use power-of-2 sizes when possible for best performance. If your data doesn't fit, zero-pad to the next power of 2, or use JTransforms/fastmath knowing it will use the Bluestein fallback.
487+
;; **Recommendation:** Use power-of-2 sizes when possible for best performance (triggers SPLIT_RADIX). If your data doesn't fit, zero-pad to the next power of 2, or use JTransforms/fastmath with the understanding that a different plan will be selected.
483488

484489
;; ## Related Functionality
485490

486491
;; Each library offers additional capabilities beyond basic FFT:
487492

488493
;; ### Apache Commons Math
489494
;; - [Inverse FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform#Inverse_FFT)
490-
;; - Sine/Cosine transforms
495+
;; - [DCT](https://en.wikipedia.org/wiki/Discrete_cosine_transform)/[DST](https://en.wikipedia.org/wiki/Discrete_sine_transform)
491496
;; - [Hadamard transform](https://en.wikipedia.org/wiki/Hadamard_transform)
492497
;; - [Complex number](https://en.wikipedia.org/wiki/Complex_number) arithmetic
493498
;; - Statistical analysis, [linear algebra](https://en.wikipedia.org/wiki/Linear_algebra), [optimization](https://en.wikipedia.org/wiki/Mathematical_optimization)

0 commit comments

Comments
 (0)