Skip to content

Commit ca37ffa

Browse files
committed
tensor images - wip
1 parent 62bdc67 commit ca37ffa

1 file changed

Lines changed: 120 additions & 58 deletions

File tree

src/dtype_next/image_analysis.clj

Lines changed: 120 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,18 @@
3232
;; - **Type discipline**: Explicit control over precision and overflow
3333
;;
3434

35+
;; ## About This Tutorial
36+
37+
;; [dtype-next](https://github.com/cnuernber/dtype-next) is a comprehensive library
38+
;; for working with typed arrays, including buffers, functional operations, tensors,
39+
;; and dataset integration. This tutorial focuses on **the tensor API**—multi-dimensional
40+
;; views over typed buffers—because images provide clear visual feedback and natural
41+
;; multi-dimensional structure.
42+
;;
43+
;; The patterns you'll learn (zero-copy views, type discipline, functional composition)
44+
;; transfer directly to other dtype-next use cases: time series analysis, scientific
45+
;; computing, ML data preparation, and any domain requiring efficient numerical arrays.
46+
3547
;; ## What We'll Build
3648

3749
;; - **Image Statistics** — channel means, ranges, distributions, histograms
@@ -87,9 +99,64 @@ original-tensor
8799

88100
;; ---
89101

102+
;; # Working with Tensors
103+
104+
;; Before diving into image analysis, let's understand what tensors are in dtype-next.
105+
;;
106+
;; **Tensors are multi-dimensional views over typed buffers.** The underlying buffer
107+
;; is a contiguous block of typed data (like our uint8 pixels), and the tensor provides
108+
;; convenient multi-dimensional indexing with shape information. This architecture enables
109+
;; zero-copy operations—when we slice or reshape, we create new views without copying data.
110+
;;
111+
;; Let's explore essential tensor operations for transforming and converting data.
112+
113+
;; ## Reshaping
114+
115+
;; Sometimes it's convenient to flatten spatial dimensions into a single axis.
116+
;; For example, reshaping `[H W 3]` → `[H×W 3]` gives us one row per pixel:
117+
118+
(-> original-tensor
119+
(tensor/reshape [(* height width) 3])
120+
dtype/shape)
121+
122+
;; **Key insight**: `tensor/reshape` is a zero-copy view operation—it reinterprets
123+
;; the buffer without copying data.
124+
125+
;; ## Tensors as Datasets
126+
127+
;; Two-dimensional tensors convert naturally to tablecloth datasets, enabling
128+
;; tabular operations and plotting:
129+
130+
(-> original-tensor
131+
(tensor/reshape [(* height width) 3])
132+
ds-tensor/tensor->dataset
133+
(tc/rename-columns [:red :green :blue]))
134+
135+
;; Or more concisely:
136+
137+
(-> original-tensor
138+
(tensor/reshape [(* height width) 3])
139+
tc/dataset
140+
(tc/rename-columns [:red :green :blue]))
141+
142+
;; We can convert back, restoring the original image structure:
143+
144+
(-> original-tensor
145+
(tensor/reshape [(* height width) 3])
146+
tc/dataset
147+
ds-tensor/dataset->tensor
148+
(tensor/reshape [height width 3])
149+
bufimg/tensor->image)
150+
151+
;; This round-trip demonstrates the seamless interop between tensors and datasets,
152+
;; useful for combining spatial operations (tensors) with statistical analysis (datasets).
153+
154+
;; ---
155+
90156
;; # Image Statistics
91157

92-
;; Let's analyze image properties using **reduction operations**.
158+
;; Now that we understand tensor fundamentals, let's analyze image properties
159+
;; using **reduction operations** and **channel slicing**.
93160

94161
;; ## Extracting Color Channels
95162

@@ -161,47 +228,13 @@ original-tensor
161228

162229
(bufimg/tensor->image grayscale)
163230

164-
;; ## Reshaping
165-
166-
;; Sometimes it is convenient to have one flat buffer
167-
;; per colour channel.
168-
169-
(-> original-tensor
170-
(tensor/reshape [(* height width) 3])
171-
dtype/shape)
172-
173-
;; ## Tensors as datasets
174-
175-
;; Tensors with two dimensions can be turned into datasets:
176-
177-
(-> original-tensor
178-
(tensor/reshape [(* height width) 3])
179-
ds-tensor/tensor->dataset
180-
(tc/rename-columns [:red :green :blue]))
181-
182-
;; Or simply:
183-
184-
(-> original-tensor
185-
(tensor/reshape [(* height width) 3])
186-
tc/dataset
187-
(tc/rename-columns [:red :green :blue]))
188-
189-
;; We can also go back the opposite direction:
190-
191-
(-> original-tensor
192-
(tensor/reshape [(* height width) 3])
193-
tc/dataset
194-
ds-tensor/dataset->tensor
195-
(tensor/reshape [height width 3])
196-
bufimg/tensor->image)
197-
198231
;; ## Histograms
199232

200233
;; A [histogram](https://en.wikipedia.org/wiki/Image_histogram) shows the distribution
201234
;; of pixel values. It's essential for understanding image brightness, contrast, and
202235
;; exposure. Peaks indicate common values; spread indicates dynamic range.
203236

204-
;; To draw the histograms, we can use a pivot transformation:
237+
;; **Approach 1**: Overlaid RGB channels using the reshape→dataset pattern we just learned:
205238

206239
(-> original-tensor
207240
(tensor/reshape [(* height width) 3])
@@ -216,6 +249,8 @@ original-tensor
216249
(plotly/layer-histogram {:=x :green
217250
:=mark-color "green"}))
218251

252+
;; **Approach 2**: Separate histograms using `dtype/as-reader` for direct tensor access:
253+
219254
(->> (assoc channels :gray grayscale)
220255
(map (fn [[k v]]
221256
(-> (tc/dataset {:x (dtype/as-reader v)})
@@ -230,9 +265,13 @@ original-tensor
230265

231266
;; # Spatial Analysis — Edges and Gradients
232267

233-
;; Analyze spatial structure using [gradient](https://en.wikipedia.org/wiki/Image_gradient)
234-
;; operations. Gradients are fundamental to [edge detection](https://en.wikipedia.org/wiki/Edge_detection),
235-
;; which identifies boundaries between regions in an image.
268+
;; We've explored *global* properties like channel means and histograms. Now let's
269+
;; analyze *local* spatial structure by comparing neighboring pixels.
270+
;;
271+
;; We'll use [gradient](https://en.wikipedia.org/wiki/Image_gradient) operations
272+
;; to measure how quickly values change across space. Gradients are fundamental to
273+
;; [edge detection](https://en.wikipedia.org/wiki/Edge_detection), which identifies
274+
;; boundaries between regions in an image.
236275

237276
;; ## Computing Gradients
238277

@@ -311,8 +350,9 @@ edges
311350

312351
;; # Enhancement Pipeline
313352

314-
;; Build composable image enhancement functions. Each transformation is
315-
;; verifiable through numeric properties we can check in the REPL.
353+
;; With analysis tools in place, let's build functions that *improve* images.
354+
;; We'll create composable transformations for white balance and contrast,
355+
;; each verifiable through numeric properties we can check in the REPL.
316356

317357
;; ## Auto White Balance
318358

@@ -422,9 +462,11 @@ edges
422462

423463
;; # Accessibility — Color Blindness Simulation
424464

425-
;; Use matrix transformations to simulate how images appear to people with
426-
;; different types of color vision deficiency. This demonstrates dtype-next's
427-
;; linear algebra capabilities with practical accessibility applications.
465+
;; Beyond enhancement, images need to be *accessible*. Let's simulate how images
466+
;; appear to people with different types of color vision deficiency.
467+
;;
468+
;; This demonstrates dtype-next's linear algebra capabilities (applying 3×3 matrices
469+
;; to RGB channels) with practical real-world applications.
428470

429471
;; Apply 3×3 transformation matrices to simulate different types of color vision deficiency.
430472

@@ -506,11 +548,14 @@ edges
506548

507549
;; ---
508550

509-
;; # Advanced — Convolution & Filtering
551+
;; # Convolution & Filtering
510552

511-
;; Convolution is the fundamental operation behind image filters, from blur to edge
512-
;; detection. We'll build a reusable convolution engine and apply various kernels,
513-
;; demonstrating `tensor/compute-tensor` for windowed operations and nested iterations.
553+
;; So far we've used simple element-wise operations and direct pixel comparisons.
554+
;; Now let's explore **convolution**, the fundamental operation behind blur, sharpen,
555+
;; and sophisticated edge detection.
556+
;;
557+
;; We'll build a reusable convolution engine using `tensor/compute-tensor` for
558+
;; windowed operations, then apply various kernels to see the dramatic effects.
514559

515560
;; ## Understanding Convolution
516561

@@ -705,11 +750,14 @@ gaussian-5x5
705750

706751
;; ---
707752

708-
;; # Reshape & Downsampling
753+
;; # Downsampling & Multi-Scale Processing
709754

710-
;; Explore multi-scale image processing through downsampling and pyramids.
711-
;; We'll demonstrate `tensor/reshape` for zero-copy view transformations and
712-
;; compare different downsampling strategies.
755+
;; Finally, let's explore working with images at multiple scales. Downsampling
756+
;; reduces resolution for faster processing or multi-scale analysis (like detecting
757+
;; features at different sizes).
758+
;;
759+
;; We'll compare downsampling strategies and build image pyramids, demonstrating
760+
;; `tensor/select` with stride patterns and aggregation techniques.
713761

714762
;; ## Understanding Reshape
715763

@@ -861,13 +909,27 @@ gaussian-5x5
861909
;; - **Lazy evaluation** — computations deferred until needed
862910
;; - **No mutation** — even `tensor/compute-tensor` builds new structures
863911

864-
;; ## Next Steps
912+
;; ## Beyond Images: dtype-next in Other Domains
913+
914+
;; The tensor patterns we've explored transfer directly to other use cases:
915+
916+
;; **Time series analysis**: 1D or 2D tensors for signals, windowing operations
917+
;; for feature extraction, functional ops for filtering and aggregation.
918+
919+
;; **Scientific computing**: Multi-dimensional numerical arrays, zero-copy slicing
920+
;; for memory efficiency, type discipline for numerical precision.
921+
922+
;; **Machine learning prep**: Batch processing, normalization pipelines, data
923+
;; augmentation—all using the same functional patterns.
924+
925+
;; **Signal processing**: Audio (1D), video (4D: time×height×width×channels),
926+
;; sensor arrays—dtype-next handles arbitrary dimensionality.
865927

866-
;; - **Batch processing**: Apply analysis to multiple images
867-
;; - **More transformations**: Blur, sharpen, rotation
868-
;; - **Dataset integration**: Use `tech.ml.dataset` for tabular results
869-
;; - **Performance**: Profile and optimize hot paths
870-
;; - **Native interop**: Interface with OpenCV, TensorFlow
928+
;; dtype-next also provides:
929+
;; - **Native interop**: Zero-copy integration with native libraries (OpenCV, TensorFlow)
930+
;; - **Dataset tools**: Rich `tech.ml.dataset` integration for tabular workflows
931+
;; - **Performance**: SIMD-optimized operations, parallel processing support
932+
;; - **Flexibility**: Custom buffer implementations, extensible type system
871933

872934
;; ## Resources
873935

0 commit comments

Comments
 (0)