A personal fork of Joel Simon's CollageNet adding an interactive img2img workflow on top of Joel's diffusion-time region projection. The original README is preserved verbatim below the divider.
- Gradio web UI (
app.py) — drag-and-drop input image, live sliders, three render modes (Render / Sweep / Party), no Jupyter required. - Img2img + projection — uses an input image to guide SDXL through
StableDiffusionXLImg2ImgPipeline, then layers Joel's region projection on top during denoising. - Party mode 🎉 — generates batches of random parameter combinations to surface unexpected aesthetics. Each render is logged to a CSV so you can recreate any specific output.
random_squareregion method — quadtree subdivision into mixed-size squares; less regular than the existingsquaremethod.- Standalone experiment notebook (
img2img_experiment.ipynb) — Jupyter sandbox version of the same flow, useful for poking at internals.
Texture source images used for the patch bank:
Wood source images used for the patch bank:
Experiment notebook (img2img_experiment.ipynb):
Gradio app (app.py):
After completing the setup in the original README below, install Gradio and launch:
pip install gradio
python app.pyOpens at http://localhost:7860. Set the source images folder path at the top, drag in an input image, dial the sliders, and click Render. Sweep and Party tabs run batches of variations.
Or use the standalone notebook for the same flow in a Jupyter cell:
jupyter lab img2img_experiment.ipynbThe original main.ipynb and render.py from upstream still work as before.
Diffusion-time region-based collage rendering for SDXL.
This project guides SDXL generation with a bank of reference image latents, then reconstructs a sharp collage-style final image from the original source pixels.
The recommended region method is felzenszwalb.
Instead of stylizing only after generation, this project interrupts SDXL during denoising:
- Read the current latent.
- Segment it into regions.
- Sample same-size candidate windows from a reference latent bank.
- Score candidates with masked cosine similarity in latent space.
- Blend the best-matching region back into the latent.
- Continue denoising.
After diffusion, the final assignment map is used to render a sharp pixel collage from the original source images.
Start from the default YAML config:
python render.py --config configs/default.yamlYAML keys use snake_case. CLI overrides prefer kebab-case, for example:
- YAML:
felzenszwalb_sigma - CLI:
--felzenszwalb-sigma
For convenience, the CLI also accepts underscore-style flags such as --felzenszwalb_sigma.
You can also override the main knobs directly from the command line:
python render.py \
--prompt "a cute cat" \
--source-images ./watercolors \
--output-dir ./outputs/cute-catExample with explicit latent input and tuned felzenszwalb settings:
python render.py \
--config configs/default.yaml \
--output-dir test \
--prompt "a cute cat" \
--source-latents watercolor-latents/ \
--felzenszwalb-min-size 8 \
--felzenszwalb-sigma 1.2Example output:
To render a seed batch:
python render.py \
--config configs/default.yaml \
--num-seeds 8 \
--seed-offset 16 \
--output-dir ./outputs/seed-batchThis saves:
a_cute_cat__projection_start_frac_0_6__do_rotated_true_seed_0016_ab12cd34.pnga_cute_cat__projection_start_frac_0_6__do_rotated_true_seed_0017_ef56gh78.png- ...
By default, only the final displayed image is saved.
If generation.output_stem is not set, filenames are auto-generated from the prompt, CLI overrides, seed, and a random suffix so repeated runs do not overwrite older images.
You can provide either:
source.source_latents- a single
.npzlatent file or a directory of.npzfiles
- a single
source.source_images- a single image file or a directory of source images
If you pass source images, render.py will encode them to temporary SDXL VAE latents automatically before building the patch bank.
If you want to precompute latents yourself:
python prepare-patches.py ./images ./latentsrender.py- command-line renderer
- YAML config + CLI overrides
- multi-seed output loop
render_config.py- structured config dataclasses
- YAML loading
render_runtime.py- pipeline loading
- patch bank creation
- projector construction
- final image saving
patch_dictionary_core.py- core matching, region projection, and collage rendering logic
source_latents.py- reusable source-image-to-latent preparation helpers
prepare-patches.py- standalone latent preprocessing script
main.ipynb- notebook demo / hackable playground
Publicly supported methods:
felzenszwalb- recommended default
threshold- optional alternative for experimentation
square- square patch baseline
The YAML config is split into model, source, generation, projection, and output.
The usual SDXL controls like image size, inference steps, and guidance scale work the way you would expect, so the notes below focus on the collage-specific settings.
source_images- Path to a source image file or directory.
- These images are encoded to SDXL VAE latents before rendering.
source_latents- Path to a latent
.npzfile or a directory of latent files. - Use this when you want faster repeated renders and do not want to re-encode images each run.
- Path to a latent
recursive- If
true, searches nested subdirectories when loading source images.
- If
max_width,max_height- Optional preprocessing caps for source images before VAE encoding.
- Useful when your source images are very large and you want a smaller latent bank.
encode_mode- Controls how source images are encoded into latents.
meanis the default and is the most stable choice for building a reusable source bank.
seed- Base seed for rendering.
num_seeds- Number of consecutive seeds to render in one CLI call.
seed_offset- Offset added to the base seed before the batch starts.
- Useful for continuing a sweep without changing the main seed.
output_stem- Optional base filename used for single-image runs and auxiliary outputs.
- If omitted, the renderer saves files like
{formatted_prompt}__{override_tokens}_seed_{seed}_{random_id}.png.
pixel_render_scale- Upscales the final sharp collage render by an integer factor.
- This affects only the saved collage-style output, not the diffusion process itself.
region_method- Chooses how the live latent is divided before matching.
- Recommended:
felzenszwalb - Alternatives:
threshold,square
patch_size- Patch size in latent cells.
1means the dictionary is built from individual latent cells, which is the current default and works well with region projection.
do_rotated- If
true, augments the source patch bank with 90 degree rotations. - This can improve matching diversity without requiring more source images.
- If
total_patches- Total number of dictionary patches sampled into the reference bank.
- Larger values improve coverage but increase memory use and lookup cost.
top_k- Number of nearest dictionary patches mixed together for square-patch matching.
- In the current setup this is usually left at
1.
dictionary_chunk_size- Chunk size used when searching the reference bank.
- This mainly trades memory for speed during cosine search.
similarity_temperature- Temperature used when turning top-k similarity scores into soft weights.
- Lower values make the projection behave more like a hard nearest-neighbor choice.
random_seed- Seed used when sampling patches from the source bank and when sampling region candidates from references.
- This is separate from the diffusion seed.
projection_start_frac- Fraction of the denoising trajectory where collage projection starts.
- Example:
0.7means projection begins in the final 30 percent of steps.
projection_end_frac- Fraction of the denoising trajectory where projection stops.
- Usually this is
1.0so projection continues to the end.
projection_every_n_steps- Applies projection only every N denoising steps.
1means project at every eligible step.
alpha_start- Blend strength at the start of the projection window.
alpha_end- Blend strength at the end of the projection window.
- Higher values force stronger source-image structure into the result.
region_candidate_count- Number of random same-size candidate windows sampled from the source bank for each region.
- Higher values improve match quality but cost more compute.
region_min_area,region_max_area- Filters region sizes in latent-cell units.
- These are useful for rejecting tiny fragments or huge regions that are not helpful to match.
region_max_bbox_h,region_max_bbox_w- Optional hard caps on region bounding-box size in latent cells.
0disables the cap.
debug_every_n_projections- Saves a region debug image every N projection events when auxiliary outputs are enabled.
preview_every_n_projections- Controls optional stored latent previews during notebook-style experimentation.
- This is mainly useful for debugging and can usually stay at
0.
felzenszwalb_scale- Main region-size knob for the default segmentation method.
- Larger values generally encourage larger merged regions, though the effect depends on the latent structure.
felzenszwalb_sigma- Smoothing applied before segmentation.
- Higher values produce smoother, less detailed region boundaries.
felzenszwalb_min_size- Minimum region size enforced by the segmenter.
- Raising this removes small regions and simplifies the collage.
threshold_min_regions,threshold_max_regions- Target range for the threshold-based segmentation method.
- The code searches for a similarity threshold that lands inside this range if possible.
threshold_connectivity- Neighborhood connectivity used during region merging.
4is more conservative;8allows diagonal connections.
threshold_similarity_low,threshold_similarity_high- Search bounds for the latent-space similarity threshold.
- These usually do not need frequent changes unless you are tuning the threshold method directly.
output_dir- Where rendered images are saved.
save_auxiliary_outputs- If
true, also saves metadata, region assignments, and debug renders. - If
false, saves only the final image.
- If
save_displayed_image- If
true, saves the sharp collage render when available. - If
false, saves the raw VAE-decoded diffusion image instead.
- If
These are solid current defaults:
projection:
region_method: "felzenszwalb"
patch_size: 1
total_patches: 20000
region_candidate_count: 128
alpha_start: 0.0
alpha_end: 0.1
projection_start_frac: 0.7
projection_end_frac: 1.0
felzenszwalb_scale: 32.0
felzenszwalb_sigma: 0.8
felzenszwalb_min_size: 8The default config does not enable any LoRA.
If you want one, set these fields in model:
lora_repolora_weight_nameembedding_filenameembedding_token
If your LoRA also needs textual inversion embeddings, render.py will load them when those fields are provided.







