SDXL CollageNet

A personal fork of Joel Simon's CollageNet adding an interactive img2img workflow on top of Joel's diffusion-time region projection. The original README is preserved verbatim below the divider.

What's new in this fork

Gradio web UI (app.py) — drag-and-drop input image, live sliders, three render modes (Render / Sweep / Party), no Jupyter required.
Img2img + projection — uses an input image to guide SDXL through StableDiffusionXLImg2ImgPipeline, then layers Joel's region projection on top during denoising.
Party mode 🎉 — generates batches of random parameter combinations to surface unexpected aesthetics. Each render is logged to a CSV so you can recreate any specific output.
random_square region method — quadtree subdivision into mixed-size squares; less regular than the existing square method.
Standalone experiment notebook (img2img_experiment.ipynb) — Jupyter sandbox version of the same flow, useful for poking at internals.

Gallery

"A portrait" + 4 texture images + selfie

Texture source images used for the patch bank:

"A game of tetris" + wood patch bank + Bryce Canyon poster

Wood source images used for the patch bank:

Mona Lisa test

Interface

Experiment notebook (img2img_experiment.ipynb):

Gradio app (app.py):

Running the new pieces

After completing the setup in the original README below, install Gradio and launch:

pip install gradio
python app.py

Opens at http://localhost:7860. Set the source images folder path at the top, drag in an input image, dial the sliders, and click Render. Sweep and Party tabs run batches of variations.

Or use the standalone notebook for the same flow in a Jupyter cell:

jupyter lab img2img_experiment.ipynb

The original main.ipynb and render.py from upstream still work as before.

Diffusion-time region-based collage rendering for SDXL.

This project guides SDXL generation with a bank of reference image latents, then reconstructs a sharp collage-style final image from the original source pixels.

The recommended region method is felzenszwalb.

Core Idea

Instead of stylizing only after generation, this project interrupts SDXL during denoising:

Read the current latent.
Segment it into regions.
Sample same-size candidate windows from a reference latent bank.
Score candidates with masked cosine similarity in latent space.
Blend the best-matching region back into the latent.
Continue denoising.

After diffusion, the final assignment map is used to render a sharp pixel collage from the original source images.

Quick Start

Start from the default YAML config:

python render.py --config configs/default.yaml

YAML keys use snake_case. CLI overrides prefer kebab-case, for example:

YAML: felzenszwalb_sigma
CLI: --felzenszwalb-sigma

For convenience, the CLI also accepts underscore-style flags such as --felzenszwalb_sigma.

You can also override the main knobs directly from the command line:

python render.py \
  --prompt "a cute cat" \
  --source-images ./watercolors \
  --output-dir ./outputs/cute-cat

Example with explicit latent input and tuned felzenszwalb settings:

python render.py \
  --config configs/default.yaml \
  --output-dir test \
  --prompt "a cute cat" \
  --source-latents watercolor-latents/ \
  --felzenszwalb-min-size 8 \
  --felzenszwalb-sigma 1.2

Example output:

To render a seed batch:

python render.py \
  --config configs/default.yaml \
  --num-seeds 8 \
  --seed-offset 16 \
  --output-dir ./outputs/seed-batch

This saves:

a_cute_cat__projection_start_frac_0_6__do_rotated_true_seed_0016_ab12cd34.png
a_cute_cat__projection_start_frac_0_6__do_rotated_true_seed_0017_ef56gh78.png
...

By default, only the final displayed image is saved. If generation.output_stem is not set, filenames are auto-generated from the prompt, CLI overrides, seed, and a random suffix so repeated runs do not overwrite older images.

Input Sources

You can provide either:

source.source_latents
- a single .npz latent file or a directory of .npz files
source.source_images
- a single image file or a directory of source images

If you pass source images, render.py will encode them to temporary SDXL VAE latents automatically before building the patch bank.

If you want to precompute latents yourself:

python prepare-patches.py ./images ./latents

Project Layout

render.py
- command-line renderer
- YAML config + CLI overrides
- multi-seed output loop
render_config.py
- structured config dataclasses
- YAML loading
render_runtime.py
- pipeline loading
- patch bank creation
- projector construction
- final image saving
patch_dictionary_core.py
- core matching, region projection, and collage rendering logic
source_latents.py
- reusable source-image-to-latent preparation helpers
prepare-patches.py
- standalone latent preprocessing script
main.ipynb
- notebook demo / hackable playground

Region Methods

Publicly supported methods:

felzenszwalb
- recommended default
threshold
- optional alternative for experimentation
square
- square patch baseline

Config Guide

The YAML config is split into model, source, generation, projection, and output.

The usual SDXL controls like image size, inference steps, and guidance scale work the way you would expect, so the notes below focus on the collage-specific settings.

`source`

source_images
- Path to a source image file or directory.
- These images are encoded to SDXL VAE latents before rendering.
source_latents
- Path to a latent .npz file or a directory of latent files.
- Use this when you want faster repeated renders and do not want to re-encode images each run.
recursive
- If true, searches nested subdirectories when loading source images.
max_width, max_height
- Optional preprocessing caps for source images before VAE encoding.
- Useful when your source images are very large and you want a smaller latent bank.
encode_mode
- Controls how source images are encoded into latents.
- mean is the default and is the most stable choice for building a reusable source bank.

`generation`

seed
- Base seed for rendering.
num_seeds
- Number of consecutive seeds to render in one CLI call.
seed_offset
- Offset added to the base seed before the batch starts.
- Useful for continuing a sweep without changing the main seed.
output_stem
- Optional base filename used for single-image runs and auxiliary outputs.
- If omitted, the renderer saves files like {formatted_prompt}__{override_tokens}_seed_{seed}_{random_id}.png.
pixel_render_scale
- Upscales the final sharp collage render by an integer factor.
- This affects only the saved collage-style output, not the diffusion process itself.

`projection`

region_method
- Chooses how the live latent is divided before matching.
- Recommended: felzenszwalb
- Alternatives: threshold, square
patch_size
- Patch size in latent cells.
- 1 means the dictionary is built from individual latent cells, which is the current default and works well with region projection.
do_rotated
- If true, augments the source patch bank with 90 degree rotations.
- This can improve matching diversity without requiring more source images.
total_patches
- Total number of dictionary patches sampled into the reference bank.
- Larger values improve coverage but increase memory use and lookup cost.
top_k
- Number of nearest dictionary patches mixed together for square-patch matching.
- In the current setup this is usually left at 1.
dictionary_chunk_size
- Chunk size used when searching the reference bank.
- This mainly trades memory for speed during cosine search.
similarity_temperature
- Temperature used when turning top-k similarity scores into soft weights.
- Lower values make the projection behave more like a hard nearest-neighbor choice.
random_seed
- Seed used when sampling patches from the source bank and when sampling region candidates from references.
- This is separate from the diffusion seed.

Projection Schedule

projection_start_frac
- Fraction of the denoising trajectory where collage projection starts.
- Example: 0.7 means projection begins in the final 30 percent of steps.
projection_end_frac
- Fraction of the denoising trajectory where projection stops.
- Usually this is 1.0 so projection continues to the end.
projection_every_n_steps
- Applies projection only every N denoising steps.
- 1 means project at every eligible step.
alpha_start
- Blend strength at the start of the projection window.
alpha_end
- Blend strength at the end of the projection window.
- Higher values force stronger source-image structure into the result.

Region Matching

region_candidate_count
- Number of random same-size candidate windows sampled from the source bank for each region.
- Higher values improve match quality but cost more compute.
region_min_area, region_max_area
- Filters region sizes in latent-cell units.
- These are useful for rejecting tiny fragments or huge regions that are not helpful to match.
region_max_bbox_h, region_max_bbox_w
- Optional hard caps on region bounding-box size in latent cells.
- 0 disables the cap.
debug_every_n_projections
- Saves a region debug image every N projection events when auxiliary outputs are enabled.
preview_every_n_projections
- Controls optional stored latent previews during notebook-style experimentation.
- This is mainly useful for debugging and can usually stay at 0.

`felzenszwalb`

felzenszwalb_scale
- Main region-size knob for the default segmentation method.
- Larger values generally encourage larger merged regions, though the effect depends on the latent structure.
felzenszwalb_sigma
- Smoothing applied before segmentation.
- Higher values produce smoother, less detailed region boundaries.
felzenszwalb_min_size
- Minimum region size enforced by the segmenter.
- Raising this removes small regions and simplifies the collage.

`threshold`

threshold_min_regions, threshold_max_regions
- Target range for the threshold-based segmentation method.
- The code searches for a similarity threshold that lands inside this range if possible.
threshold_connectivity
- Neighborhood connectivity used during region merging.
- 4 is more conservative; 8 allows diagonal connections.
threshold_similarity_low, threshold_similarity_high
- Search bounds for the latent-space similarity threshold.
- These usually do not need frequent changes unless you are tuning the threshold method directly.

`output`

output_dir
- Where rendered images are saved.
save_auxiliary_outputs
- If true, also saves metadata, region assignments, and debug renders.
- If false, saves only the final image.
save_displayed_image
- If true, saves the sharp collage render when available.
- If false, saves the raw VAE-decoded diffusion image instead.

Recommended Starting Settings

These are solid current defaults:

projection:
  region_method: "felzenszwalb"
  patch_size: 1
  total_patches: 20000
  region_candidate_count: 128
  alpha_start: 0.0
  alpha_end: 0.1
  projection_start_frac: 0.7
  projection_end_frac: 1.0
  felzenszwalb_scale: 32.0
  felzenszwalb_sigma: 0.8
  felzenszwalb_min_size: 8

Optional LoRA

The default config does not enable any LoRA.

If you want one, set these fields in model:

lora_repo
lora_weight_name
embedding_filename
embedding_token

If your LoRA also needs textual inversion embeddings, render.py will load them when those fields are provided.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDXL CollageNet — img2img + party mode (Nathan's fork)

What's new in this fork

Gallery

"A portrait" + 4 texture images + selfie

"A game of tetris" + wood patch bank + Bryce Canyon poster

Mona Lisa test

Interface

Running the new pieces

SDXL CollageNet

Core Idea

Quick Start

Input Sources

Project Layout

Region Methods

Config Guide

`source`

`generation`

`projection`

Projection Schedule

Region Matching

`felzenszwalb`

`threshold`

`output`

Recommended Starting Settings

Optional LoRA

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
gan-upscaled-latents		gan-upscaled-latents
watercolor-latents		watercolor-latents
.gitignore		.gitignore
README.md		README.md
app.py		app.py
experiment_core.py		experiment_core.py
img2img_experiment.ipynb		img2img_experiment.ipynb
main.ipynb		main.ipynb
patch_dictionary_core.py		patch_dictionary_core.py
prepare-patches.py		prepare-patches.py
render.py		render.py
render_config.py		render_config.py
render_runtime.py		render_runtime.py
source_latents.py		source_latents.py

Folders and files

Latest commit

History

Repository files navigation

SDXL CollageNet — img2img + party mode (Nathan's fork)

What's new in this fork

Gallery

"A portrait" + 4 texture images + selfie

"A game of tetris" + wood patch bank + Bryce Canyon poster

Mona Lisa test

Interface

Running the new pieces

SDXL CollageNet

Core Idea

Quick Start

Input Sources

Project Layout

Region Methods

Config Guide

source

generation

projection

Projection Schedule

Region Matching

felzenszwalb

threshold

output

Recommended Starting Settings

Optional LoRA

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`source`

`generation`

`projection`

`felzenszwalb`

`threshold`

`output`

Packages