Skip to content

Commit 92a514f

Browse files
committed
image tensor wip
1 parent e8f8be6 commit 92a514f

1 file changed

Lines changed: 94 additions & 99 deletions

File tree

src/dtype_next/image_processing_with_tensors.clj

Lines changed: 94 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -123,19 +123,20 @@ original-tensor
123123

124124
(dtype/shape original-tensor)
125125

126-
;; This is `[height width channels]` — our image has 3 channels.
126+
;; This is `[height width channels]` — our image has 3 color channels.
127127

128-
;; **How do we know it's RGB format?** Check the image type:
128+
;; **Channel ordering**: Java's BufferedImage uses BGR order internally:
129129

130130
(bufimg/image-type original-img)
131131

132-
;; `:type-3byte-bgr` indicates BGR byte order (Blue-Green-Red), which is Java's
133-
;; internal BufferedImage format. The question is: does `bufimg/as-ubyte-tensor`
134-
;; preserve BGR order or convert to RGB?
132+
;; `:byte-bgr` confirms BGR byte order. The `bufimg/as-ubyte-tensor` function
133+
;; preserves this ordering, so our tensor channels are:
134+
;; - **Channel 0 = Blue**
135+
;; - **Channel 1 = Green**
136+
;; - **Channel 2 = Red**
135137
;;
136-
;; We'll assume RGB order in this tutorial (which is the common convention), but
137-
;; if colors appear incorrect, you may need to swap channels. You can verify by
138-
;; checking if the red channel actually contains red values in your specific image.
138+
;; This is the opposite of the more common RGB convention. Throughout this tutorial,
139+
;; we'll work with BGR order and be explicit about it in our code.
139140

140141
(def height
141142
(first (dtype/shape original-tensor)))
@@ -201,14 +202,14 @@ original-tensor
201202
(-> original-tensor
202203
(tensor/reshape [(* height width) 3])
203204
ds-tensor/tensor->dataset
204-
(tc/rename-columns [:red :green :blue]))
205+
(tc/rename-columns [:blue :green :red]))
205206

206207
;; Or more concisely (tablecloth auto-converts):
207208

208209
(-> original-tensor
209210
(tensor/reshape [(* height width) 3])
210211
tc/dataset
211-
(tc/rename-columns [:red :green :blue]))
212+
(tc/rename-columns [:blue :green :red]))
212213

213214
;; We can convert back, restoring the original image structure:
214215

@@ -246,9 +247,12 @@ original-tensor
246247
[3 4] ; shape: 3 rows, 4 columns
247248
(fn [row col] ; function receives [row col] indices
248249
(+ (* row 10) col)) ; compute value: row*10 + col
249-
:int32)) ; element type
250+
:int32 ; element type
251+
))
250252

251-
toy-tensor
253+
;; Check an element:
254+
255+
(toy-tensor 2 1)
252256

253257
;; Verify the shape:
254258

@@ -305,8 +309,6 @@ toy-tensor
305309

306310
(def small-tensor (tensor/compute-tensor [2 3] (fn [r c] (+ (* r 3) c)) :int32))
307311

308-
small-tensor
309-
310312
;; Add scalar to every element (broadcasting):
311313

312314
(dfn/+ small-tensor 10)
@@ -343,8 +345,6 @@ small-tensor
343345

344346
(def tiny-uint8 (tensor/compute-tensor [2 2] (fn [_ _] 100) :uint8))
345347

346-
tiny-uint8
347-
348348
;; Element type:
349349

350350
(dtype/elemwise-datatype tiny-uint8)
@@ -389,13 +389,13 @@ flat-tensor
389389
;; Use `tensor/select` to slice out individual channels (zero-copy views):
390390

391391
(defn extract-channels
392-
"Extract R, G, B channels from RGB tensor.
393-
Takes: [H W 3] tensor
394-
Returns: map with :red, :green, :blue tensors (each [H W])"
392+
"Extract B, G, R channels from BGR tensor.
393+
Takes: [H W 3] tensor (BGR order)
394+
Returns: map with :blue, :green, :red tensors (each [H W])"
395395
[img-tensor]
396-
{:red (tensor/select img-tensor :all :all 0)
396+
{:blue (tensor/select img-tensor :all :all 0)
397397
:green (tensor/select img-tensor :all :all 1)
398-
:blue (tensor/select img-tensor :all :all 2)})
398+
:red (tensor/select img-tensor :all :all 2)})
399399

400400
(def channels (extract-channels original-tensor))
401401

@@ -433,17 +433,18 @@ flat-tensor
433433
;; moderately sensitive to red, and least sensitive to blue. The coefficients
434434
;; (0.299, 0.587, 0.114) approximate the [relative luminance](https://en.wikipedia.org/wiki/Relative_luminance)
435435
;; formula from the [ITU-R BT.601](https://en.wikipedia.org/wiki/Rec._601) standard,
436-
;; ensuring grayscale images preserve
437-
;; perceived brightness rather than simple RGB averages.
436+
;; ensuring grayscale images preserve perceived brightness rather than simple
437+
;; equal weighting of color channels.
438438

439439
(defn to-grayscale
440-
"Convert RGB [H W 3] to grayscale [H W].
440+
"Convert BGR [H W 3] to grayscale [H W].
441441
Standard formula: 0.299*R + 0.587*G + 0.114*B
442+
Takes BGR tensor, extracts channels correctly.
442443
Returns float32 tensor (use dtype/elemwise-cast for uint8)."
443444
[img-tensor]
444-
(let [r (tensor/select img-tensor :all :all 0)
445-
g (tensor/select img-tensor :all :all 1)
446-
b (tensor/select img-tensor :all :all 2)]
445+
(let [b (tensor/select img-tensor :all :all 0) ; Blue is channel 0
446+
g (tensor/select img-tensor :all :all 1) ; Green is channel 1
447+
r (tensor/select img-tensor :all :all 2)] ; Red is channel 2
447448
(dfn/+ (dfn/* r 0.299)
448449
(dfn/* g 0.587)
449450
(dfn/* b 0.114))))
@@ -468,12 +469,12 @@ flat-tensor
468469
;; of pixel values. It's essential for understanding image brightness, contrast, and
469470
;; exposure. Peaks indicate common values; spread indicates dynamic range.
470471

471-
;; **Approach 1**: Overlaid RGB channels using the reshape→dataset pattern we just learned:
472+
;; **Approach 1**: Overlaid BGR channels using the reshape→dataset pattern we just learned:
472473

473474
(-> original-tensor
474475
(tensor/reshape [(* height width) 3])
475476
tc/dataset
476-
(tc/rename-columns [:red :green :blue])
477+
(tc/rename-columns [:blue :green :red])
477478
(plotly/base {:=histogram-nbins 30
478479
:=mark-opacity 0.5})
479480
(plotly/layer-histogram {:=x :red
@@ -580,7 +581,7 @@ edges
580581

581582
(defn sharpness-score
582583
"Compute sharpness as mean edge magnitude.
583-
Takes: [H W 3] RGB or [H W] grayscale tensor
584+
Takes: [H W 3] BGR or [H W] grayscale tensor
584585
Returns: scalar (higher = sharper)"
585586
[img-tensor]
586587
(let [gray (to-grayscale img-tensor)
@@ -604,13 +605,13 @@ edges
604605
;; ## Auto White Balance
605606

606607
;; [White balance](https://en.wikipedia.org/wiki/Color_balance) adjusts colors to
607-
;; appear neutral under different lighting conditions. We scale RGB channels to have
608+
;; appear neutral under different lighting conditions. We scale BGR channels to have
608609
;; equal means, removing color casts.
609610

610611
(defn auto-white-balance
611-
"Scale RGB channels to have equal means.
612-
Takes: [H W 3] uint8 tensor
613-
Returns: [H W 3] uint8 tensor"
612+
"Scale BGR channels to have equal means.
613+
Takes: [H W 3] uint8 BGR tensor
614+
Returns: [H W 3] uint8 BGR tensor"
614615
[img-tensor]
615616
(let [;; Compute channel means using reduce-axis
616617
;; First reduce: [H W 3] → [W 3] (collapse height, axis 0)
@@ -659,39 +660,29 @@ edges
659660

660661
(defn enhance-contrast
661662
"Increase image contrast by amplifying deviation from mean.
662-
Takes: [H W 3] uint8 tensor, factor (> 1 increases, < 1 decreases)
663-
Returns: [H W 3] uint8 tensor"
663+
Takes: [H W 3] uint8 BGR tensor, factor (> 1 increases, < 1 decreases)
664+
Returns: [H W 3] uint8 BGR tensor"
664665
[img-tensor factor]
665-
(let [r (tensor/select img-tensor :all :all 0)
666-
g (tensor/select img-tensor :all :all 1)
667-
b (tensor/select img-tensor :all :all 2)
668-
669-
;; Compute mean for each channel
670-
r-mean (dfn/mean r)
671-
g-mean (dfn/mean g)
672-
b-mean (dfn/mean b)
673-
674-
;; Apply contrast: mean + factor * (value - mean)
675-
enhance-channel (fn [ch ch-mean]
676-
(dtype/elemwise-cast
677-
(dfn/min 255
678-
(dfn/max 0
679-
(dfn/+ ch-mean
680-
(dfn/* (dfn/- ch ch-mean) factor))))
681-
:uint8))
682-
683-
r-enhanced (enhance-channel r r-mean)
684-
g-enhanced (enhance-channel g g-mean)
685-
b-enhanced (enhance-channel b b-mean)
666+
(let [[h w c] (dtype/shape img-tensor)
667+
668+
;; Process each channel independently
669+
enhanced-channels (mapv (fn [ch]
670+
(let [channel (tensor/select img-tensor :all :all ch)
671+
ch-mean (dfn/mean channel)]
672+
;; Apply contrast: mean + factor * (value - mean)
673+
(dtype/elemwise-cast
674+
(dfn/min 255
675+
(dfn/max 0
676+
(dfn/+ ch-mean
677+
(dfn/* (dfn/- channel ch-mean) factor))))
678+
:uint8)))
679+
(range c))]
686680

687-
[h w _] (dtype/shape img-tensor)]
681+
;; Reassemble channels
688682
(tensor/compute-tensor
689-
[h w 3]
690-
(fn [y x c]
691-
(case c
692-
0 (tensor/mget r-enhanced y x)
693-
1 (tensor/mget g-enhanced y x)
694-
2 (tensor/mget b-enhanced y x)))
683+
[h w c]
684+
(fn [y x ch]
685+
(tensor/mget (nth enhanced-channels ch) y x))
695686
:uint8)))
696687

697688
(def contrasted (enhance-contrast original-tensor 1.5))
@@ -716,7 +707,7 @@ edges
716707
;; appear to people with different types of color vision deficiency.
717708
;;
718709
;; This demonstrates dtype-next's linear algebra capabilities (applying 3×3 matrices
719-
;; to RGB channels) with practical real-world applications.
710+
;; to BGR channels) with practical real-world applications.
720711

721712
;; Apply 3×3 transformation matrices to simulate different types of color vision deficiency.
722713

@@ -726,59 +717,64 @@ edges
726717
;; (color vision deficiency). Different types affect perception of red, green, or blue:
727718

728719
(def color-blindness-matrices
729-
{:protanopia [[0.567 0.433 0.000] ; Red-blind
730-
[0.558 0.442 0.000]
731-
[0.000 0.242 0.758]]
720+
"Color blindness simulation matrices.
721+
Each matrix is 3×3 with columns in BGR order: [B G R]
722+
Matrices adapted from standard RGB formulas, reordered for BGR."
723+
{:protanopia [[0.000 0.433 0.567] ; Red-blind (BGR columns)
724+
[0.000 0.442 0.558]
725+
[0.758 0.242 0.000]]
732726

733-
:deuteranopia [[0.625 0.375 0.000] ; Green-blind
734-
[0.700 0.300 0.000]
735-
[0.000 0.300 0.700]]
727+
:deuteranopia [[0.000 0.375 0.625] ; Green-blind (BGR columns)
728+
[0.000 0.300 0.700]
729+
[0.700 0.300 0.000]]
736730

737-
:tritanopia [[0.950 0.050 0.000] ; Blue-blind
738-
[0.000 0.433 0.567]
739-
[0.000 0.475 0.525]]})
731+
:tritanopia [[0.000 0.050 0.950] ; Blue-blind (BGR columns)
732+
[0.567 0.433 0.000]
733+
[0.525 0.475 0.000]]})
740734

741735
;; ## Applying Matrix Transformations
742736

743-
;; Extract RGB channels, apply linear combinations, reassemble:
737+
;; Extract BGR channels, apply linear combinations, reassemble:
744738

745739
(defn apply-color-matrix
746-
"Apply 3×3 transformation matrix to RGB channels.
747-
Takes: [H W 3] tensor, 3×3 matrix [[r0 g0 b0] [r1 g1 b1] [r2 g2 b2]]
748-
Returns: [H W 3] uint8 tensor
749-
Formula: new_r = r0*R + g0*G + b0*B, etc."
740+
"Apply 3×3 transformation matrix to BGR channels.
741+
Takes: [H W 3] BGR tensor, 3×3 matrix [[b0 g0 r0] [b1 g1 r1] [b2 g2 r2]]
742+
Returns: [H W 3] uint8 BGR tensor
743+
Formula: new_b = b0*B + g0*G + r0*R, etc.
744+
745+
Note: Matrix coefficients correspond to BGR order (channel 0=B, 1=G, 2=R)"
750746
[img-tensor matrix]
751-
(let [r (tensor/select img-tensor :all :all 0)
752-
g (tensor/select img-tensor :all :all 1)
753-
b (tensor/select img-tensor :all :all 2)
747+
(let [b (tensor/select img-tensor :all :all 0) ; Blue channel
748+
g (tensor/select img-tensor :all :all 1) ; Green channel
749+
r (tensor/select img-tensor :all :all 2) ; Red channel
754750

755-
[[r0 g0 b0]
756-
[r1 g1 b1]
757-
[r2 g2 b2]] matrix
751+
[[b0 g0 r0]
752+
[b1 g1 r1]
753+
[b2 g2 r2]] matrix
758754

759-
;; Apply transformation
760-
new-r (dfn/+ (dfn/+ (dfn/* r r0) (dfn/* g g0)) (dfn/* b b0))
761-
new-g (dfn/+ (dfn/+ (dfn/* r r1) (dfn/* g g1)) (dfn/* b b1))
762-
new-b (dfn/+ (dfn/+ (dfn/* r r2) (dfn/* g g2)) (dfn/* b b2))
755+
;; Apply transformation (BGR order)
756+
new-b (dfn/+ (dfn/+ (dfn/* b b0) (dfn/* g g0)) (dfn/* r r0))
757+
new-g (dfn/+ (dfn/+ (dfn/* b b1) (dfn/* g g1)) (dfn/* r r1))
758+
new-r (dfn/+ (dfn/+ (dfn/* b b2) (dfn/* g g2)) (dfn/* r r2))
763759

764760
;; Clamp and cast
765761
clamp-cast (fn [ch]
766762
(dtype/elemwise-cast
767763
(dfn/min 255 (dfn/max 0 ch))
768764
:uint8))
769765

770-
new-r-clamped (clamp-cast new-r)
771-
new-g-clamped (clamp-cast new-g)
772766
new-b-clamped (clamp-cast new-b)
767+
new-g-clamped (clamp-cast new-g)
768+
new-r-clamped (clamp-cast new-r)
773769

774770
[h w _] (dtype/shape img-tensor)]
775771
(tensor/compute-tensor
776772
[h w 3]
777773
(fn [y x c]
778774
(case c
779-
0 (tensor/mget new-r-clamped y x)
780-
1 (tensor/mget new-g-clamped y x)
781-
2 (tensor/mget new-b-clamped y x)))
775+
0 (tensor/mget new-b-clamped y x) ; Blue channel 0
776+
1 (tensor/mget new-g-clamped y x) ; Green channel 1
777+
2 (tensor/mget new-r-clamped y x))) ; Red channel 2
782778
:uint8)))
783779

784780
(defn simulate-color-blindness
@@ -858,10 +854,9 @@ kernel-3x3
858854
(tensor/compute-tensor
859855
[h w]
860856
(fn [y x]
861-
;; Note: Using atom here for accumulation within tensor/compute-tensor's
862-
;; position function. While not idiomatic Clojure, it's the clearest way
863-
;; to accumulate over a 2D neighborhood. The mutation is local to this
864-
;; function call and doesn't escape.
857+
;; Accumulate weighted sum over kernel neighborhood.
858+
;; We use an atom for local accumulation since tensor/compute-tensor
859+
;; expects a function that returns a single value per position.
865860
(let [sum (atom 0.0)]
866861
(doseq [ky (range kh)
867862
kx (range kw)]
@@ -946,7 +941,7 @@ gaussian-5x5
946941
;; ## Sharpness comparison
947942

948943
;; We can quantify the sharpness of each filter using our `sharpness-score` function.
949-
;; However, note that `sharpness-score` expects RGB input, so we need to adapt it for
944+
;; However, note that `sharpness-score` expects BGR input, so we need to adapt it for
950945
;; grayscale tensors. For simplicity, we'll compute sharpness inline here:
951946

952947
(-> {:original grayscale

0 commit comments

Comments
 (0)