|
22 | 22 | [com.github.psambit9791.jdsp.transform FastFourier] |
23 | 23 | [org.jtransforms.fft DoubleFFT_1D])) |
24 | 24 |
|
| 25 | +^{:kindly/hide-code true |
| 26 | + :kindly/kind :kind/hidden} |
25 | 27 | (set! *warn-on-reflection* true) |
26 | 28 |
|
27 | 29 | ;; # Introduction |
|
46 | 48 | ;; **Key features:** |
47 | 49 | ;; - Part of Apache Commons (widely used, stable, maintained) |
48 | 50 | ;; - Supports multiple normalization conventions |
49 | | -;; - 1D FFT, [Hadamard](https://en.wikipedia.org/wiki/Hadamard_transform), and sine/cosine transforms |
50 | | -;; - Requires input length to be a [power of 2](https://en.wikipedia.org/wiki/Power_of_two) |
| 51 | +;; - 1D FFT, [DCT](https://en.wikipedia.org/wiki/Discrete_cosine_transform)/[DST](https://en.wikipedia.org/wiki/Discrete_sine_transform), [Hadamard](https://en.wikipedia.org/wiki/Hadamard_transform) transforms |
| 52 | +;; - Requires signal length to be a [power of 2](https://en.wikipedia.org/wiki/Power_of_two) |
51 | 53 | ;; - Returns [`Complex[]`](https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/complex/Complex.html) arrays (allocation overhead) |
52 | 54 |
|
53 | 55 | ;; ### JDSP |
|
121 | 123 | :=title "Test Signal: 5 Hz + 12 Hz" |
122 | 124 | :=width 700 |
123 | 125 | :=height 300}) |
124 | | - (plotly/layer-line {:=mark-color "steelblue"}) |
125 | | - plotly/plot) |
| 126 | + (plotly/layer-line {:=mark-color "steelblue"})) |
126 | 127 |
|
127 | 128 | ;; ## Visualization Helper |
128 | 129 |
|
|
149 | 150 | :=title title |
150 | 151 | :=width 700 |
151 | 152 | :=height 300}) |
152 | | - (plotly/layer-line {:=mark-color color}) |
153 | | - plotly/plot))) |
| 153 | + (plotly/layer-line {:=mark-color color})))) |
154 | 154 |
|
155 | 155 | ;; ## FFT Implementation #1: Apache Commons Math |
156 | 156 |
|
|
314 | 314 | (map (fn [{:keys [library size mean-ms lower-q-ms upper-q-ms]}] |
315 | 315 | {:library library |
316 | 316 | :size (if (= size 128) "128 (2^7)" "131K (2^17)") |
317 | | - :mean-time (format "%.3f ms" mean-ms) |
318 | | - :range (format "%.3f - %.3f ms" lower-q-ms upper-q-ms)}) |
| 317 | + :mean-time (format "%.6f ms" mean-ms) |
| 318 | + :range (format "%.6f - %.6f ms" lower-q-ms upper-q-ms)}) |
319 | 319 | library-benchmarks)) |
320 | 320 |
|
321 | 321 | ;; ## Exploring Parallelization: An Experiment |
|
324 | 324 | ;; |
325 | 325 | ;; **Disclaimer:** This section documents an exploratory experiment with uncertain conclusions. We're learning as we go, and the results are puzzling! |
326 | 326 |
|
327 | | -;; ### How JTransforms Works: Two Algorithms |
| 327 | +;; ### How JTransforms Works: Three Algorithm Plans |
328 | 328 |
|
329 | | -;; Looking at the [JTransforms source code](https://github.com/wendykierp/JTransforms/blob/master/src/main/java/org/jtransforms/fft/DoubleFFT_1D.java), we discovered it uses **two different FFT algorithms**: |
| 329 | +;; Looking at the [JTransforms source code](https://github.com/wendykierp/JTransforms/blob/master/src/main/java/org/jtransforms/fft/DoubleFFT_1D.java), we discovered it uses **three different execution plans**: |
330 | 330 | ;; |
331 | | -;; 1. **[Cooley-Tukey FFT](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm)** - Fast, parallelizable, but requires power-of-2 input sizes |
332 | | -;; 2. **[Bluestein's algorithm](https://en.wikipedia.org/wiki/Chirp_Z-transform#Bluestein's_algorithm)** (`bluestein_complex` in the code) - Works with any size, likely slower and possibly single-threaded |
| 331 | +;; 1. **SPLIT_RADIX** ([Cooley-Tukey variant](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm)) - Used for power-of-2 sizes, supports multithreading |
| 332 | +;; 2. **MIXED_RADIX** (Cooley-Tukey variant) - Used for factorable sizes (divisible by 2, 3, 4, 5), primarily sequential |
| 333 | +;; 3. **BLUESTEIN** ([Chirp Z-transform](https://en.wikipedia.org/wiki/Chirp_Z-transform#Bluestein's_algorithm)) - Used for arbitrary sizes with large prime factors, supports 1-4 threads |
333 | 334 | ;; |
334 | | -;; When you provide a power-of-2 size, JTransforms uses Cooley-Tukey (optimal). For other sizes, it automatically falls back to Bluestein. |
| 335 | +;; The plan is automatically selected in the constructor based on signal size: |
| 336 | +;; - Power-of-2 → SPLIT_RADIX (fastest, our benchmarks use this) |
| 337 | +;; - Factorable by small primes → MIXED_RADIX |
| 338 | +;; - Large prime factors (remainder ≥ 211) → BLUESTEIN |
335 | 339 | ;; |
336 | | -;; This explains why our earlier tests with sizes like 100, 127, 1000 worked—they used Bluestein! |
| 340 | +;; This explains why arbitrary sizes like 100, 127, 1000 work—they use MIXED_RADIX or BLUESTEIN! |
337 | 341 | ;; |
338 | | -;; The [JTransforms paper (Wendykier & Grote 2012)](https://www.math.emory.edu/technical-reports/techrep-00127.pdf) discusses the Cooley-Tukey implementation, |
| 342 | +;; The [JTransforms paper (Wendykier & Nagy 2008)](https://www.math.emory.edu/technical-reports/techrep-00127.pdf) discusses these implementations, |
339 | 343 | ;; noting that 1D transforms can use at most 2 or 4 threads due to the algorithm's structure. |
340 | 344 |
|
341 | 345 | ;; ### Testing Thread Performance |
|
383 | 387 | (plotly/base {:=x :threads |
384 | 388 | :=y :mean-ms |
385 | 389 | :=color :size |
| 390 | + :=color-type :nominal |
386 | 391 | :=title "FFT Performance vs Thread Count (fastmath/JTransforms)" |
387 | 392 | :=x-title "Number of Threads" |
388 | 393 | :=y-title "Mean Time per FFT (ms)" |
389 | 394 | :=width 800 |
390 | 395 | :=height 500}) |
391 | 396 | (plotly/layer-point {:=mark-size 10}) |
392 | | - (plotly/layer-line {:=mark-opacity 0.6}) |
393 | | - plotly/plot) |
| 397 | + (plotly/layer-line {:=mark-opacity 0.6})) |
394 | 398 |
|
395 | 399 | ;; ### What We Found (Spoiler: It's Confusing!) |
396 | 400 |
|
|
459 | 463 | ;; - **Heap**: 15,936 MB max |
460 | 464 | ;; - **Clojure**: 1.12.3 |
461 | 465 |
|
462 | | -;; ## Input Size Requirements |
| 466 | +;; ## Signal Size Requirements |
463 | 467 |
|
464 | | -;; Different libraries have different requirements for input signal lengths: |
| 468 | +;; Different libraries have different requirements for signal lengths: |
465 | 469 |
|
466 | 470 | ;; ### Apache Commons Math: Power-of-2 Required |
467 | 471 | ;; |
468 | 472 | ;; Apache Commons Math **strictly requires** signal length to be a [power of 2](https://en.wikipedia.org/wiki/Power_of_two) (128, 256, 512, 1024, etc.). This is because it only implements the classic [Cooley-Tukey algorithm](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm). |
469 | 473 |
|
470 | 474 | ;; ### JTransforms/fastmath: Flexible (with caveats) |
471 | 475 | ;; |
472 | | -;; Based on our source code investigation (see the threading section above), JTransforms uses **two different algorithms**: |
| 476 | +;; Based on our source code investigation (see the threading section above), JTransforms uses **three execution plans**: |
473 | 477 | ;; |
474 | | -;; 1. **Cooley-Tukey** - Fast, parallelizable, requires power-of-2 |
475 | | -;; 2. **Bluestein's algorithm** - Works with arbitrary lengths |
| 478 | +;; 1. **SPLIT_RADIX** (Cooley-Tukey variant) - Fast, multithreaded, requires power-of-2 |
| 479 | +;; 2. **MIXED_RADIX** (Cooley-Tukey variant) - For factorable sizes, primarily sequential |
| 480 | +;; 3. **BLUESTEIN** (Chirp Z-transform) - For arbitrary sizes with large prime factors, supports 1-4 threads |
476 | 481 | ;; |
477 | | -;; When you provide a non-power-of-2 size, JTransforms automatically falls back to Bluestein's algorithm. This means: |
| 482 | +;; When you provide a non-power-of-2 size, JTransforms uses MIXED_RADIX or BLUESTEIN depending on the size's prime factorization. This means: |
478 | 483 | ;; - ✅ It works with any size (we tested 100, 127, 1000 successfully) |
479 | 484 | ;; - ⚠️ Performance may be slower for non-power-of-2 sizes |
480 | | -;; - ⚠️ Parallelization may not work with Bluestein (unclear from our experiments) |
| 485 | +;; - ⚠️ MIXED_RADIX is primarily sequential; BLUESTEIN supports limited parallelization |
481 | 486 | ;; |
482 | | -;; **Recommendation:** Use power-of-2 sizes when possible for best performance. If your data doesn't fit, zero-pad to the next power of 2, or use JTransforms/fastmath knowing it will use the Bluestein fallback. |
| 487 | +;; **Recommendation:** Use power-of-2 sizes when possible for best performance (triggers SPLIT_RADIX). If your data doesn't fit, zero-pad to the next power of 2, or use JTransforms/fastmath with the understanding that a different plan will be selected. |
483 | 488 |
|
484 | 489 | ;; ## Related Functionality |
485 | 490 |
|
486 | 491 | ;; Each library offers additional capabilities beyond basic FFT: |
487 | 492 |
|
488 | 493 | ;; ### Apache Commons Math |
489 | 494 | ;; - [Inverse FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform#Inverse_FFT) |
490 | | -;; - Sine/Cosine transforms |
| 495 | +;; - [DCT](https://en.wikipedia.org/wiki/Discrete_cosine_transform)/[DST](https://en.wikipedia.org/wiki/Discrete_sine_transform) |
491 | 496 | ;; - [Hadamard transform](https://en.wikipedia.org/wiki/Hadamard_transform) |
492 | 497 | ;; - [Complex number](https://en.wikipedia.org/wiki/Complex_number) arithmetic |
493 | 498 | ;; - Statistical analysis, [linear algebra](https://en.wikipedia.org/wiki/Linear_algebra), [optimization](https://en.wikipedia.org/wiki/Mathematical_optimization) |
|
0 commit comments