Diffusion-time region-based collage rendering for SDXL.
This project guides SDXL generation with a bank of reference image latents, then reconstructs a sharp collage-style final image from the original source pixels.
The recommended region method is felzenszwalb.
Instead of stylizing only after generation, this project interrupts SDXL during denoising:
- Read the current latent.
- Segment it into regions.
- Sample same-size candidate windows from a reference latent bank.
- Score candidates with masked cosine similarity in latent space.
- Blend the best-matching region back into the latent.
- Continue denoising.
After diffusion, the final assignment map is used to render a sharp pixel collage from the original source images.
The example below uses conda; any environment manager (venv, mamba, etc.) works.
conda create -n collagenet python=3.12
conda activate collagenetpip install torchIf torch.cuda.is_available() returns False after install on a machine with an NVIDIA GPU, the CUDA runtime bundled in the default wheel may be newer than your driver supports. Pick a driver-matched build from https://pytorch.org/get-started/locally/ — for example, for drivers that support up to CUDA 12.4:
pip install torch --index-url https://download.pytorch.org/whl/cu124pip install -r requirements.txtTo run main.ipynb, install Jupyter or use a notebook-aware editor such as VS Code:
pip install jupyterStart from the default YAML config:
python render.py --config configs/default.yamlYAML keys use snake_case. CLI overrides prefer kebab-case, for example:
- YAML:
felzenszwalb_sigma - CLI:
--felzenszwalb-sigma
For convenience, the CLI also accepts underscore-style flags such as --felzenszwalb_sigma.
You can also override the main knobs directly from the command line:
python render.py \
--prompt "a cute cat" \
--source-images ./watercolors \
--output-dir ./outputs/cute-catExample with explicit latent input and tuned felzenszwalb settings:
python render.py \
--config configs/default.yaml \
--output-dir test \
--prompt "a cute cat" \
--source-latents watercolor-latents/ \
--felzenszwalb-min-size 8 \
--felzenszwalb-sigma 1.2Example output:
To render a seed batch:
python render.py \
--config configs/default.yaml \
--num-seeds 8 \
--seed-offset 16 \
--output-dir ./outputs/seed-batchThis saves:
a_cute_cat__projection_start_frac_0_6__do_rotated_true_seed_0016_ab12cd34.pnga_cute_cat__projection_start_frac_0_6__do_rotated_true_seed_0017_ef56gh78.png- ...
By default, only the final displayed image is saved.
If generation.output_stem is not set, filenames are auto-generated from the prompt, CLI overrides, seed, and a random suffix so repeated runs do not overwrite older images.
You can provide either:
source.source_latents- a single
.npzlatent file or a directory of.npzfiles
- a single
source.source_images- a single image file or a directory of source images
If you pass source images, render.py will encode them to temporary SDXL VAE latents automatically before building the patch bank.
If you want to precompute latents yourself:
python prepare-patches.py ./images ./latentsrender.py- command-line renderer
- YAML config + CLI overrides
- multi-seed output loop
render_config.py- structured config dataclasses
- YAML loading
render_runtime.py- pipeline loading
- patch bank creation
- projector construction
- final image saving
patch_dictionary_core.py- core matching, region projection, and collage rendering logic
source_latents.py- reusable source-image-to-latent preparation helpers
prepare-patches.py- standalone latent preprocessing script
main.ipynb- notebook demo / hackable playground
Publicly supported methods:
felzenszwalb- recommended default
threshold- optional alternative for experimentation
square- square patch baseline
The YAML config is split into model, source, generation, projection, and output.
The usual SDXL controls like image size, inference steps, and guidance scale work the way you would expect, so the notes below focus on the collage-specific settings.
source_images- Path to a source image file or directory.
- These images are encoded to SDXL VAE latents before rendering.
source_latents- Path to a latent
.npzfile or a directory of latent files. - Use this when you want faster repeated renders and do not want to re-encode images each run.
- Path to a latent
recursive- If
true, searches nested subdirectories when loading source images.
- If
max_width,max_height- Optional preprocessing caps for source images before VAE encoding.
- Useful when your source images are very large and you want a smaller latent bank.
encode_mode- Controls how source images are encoded into latents.
meanis the default and is the most stable choice for building a reusable source bank.
seed- Base seed for rendering.
num_seeds- Number of consecutive seeds to render in one CLI call.
seed_offset- Offset added to the base seed before the batch starts.
- Useful for continuing a sweep without changing the main seed.
output_stem- Optional base filename used for single-image runs and auxiliary outputs.
- If omitted, the renderer saves files like
{formatted_prompt}__{override_tokens}_seed_{seed}_{random_id}.png.
pixel_render_scale- Upscales the final sharp collage render by an integer factor.
- This affects only the saved collage-style output, not the diffusion process itself.
region_method- Chooses how the live latent is divided before matching.
- Recommended:
felzenszwalb - Alternatives:
threshold,square
patch_size- Patch size in latent cells.
1means the dictionary is built from individual latent cells, which is the current default and works well with region projection.
do_rotated- If
true, augments the source patch bank with 90 degree rotations. - This can improve matching diversity without requiring more source images.
- If
total_patches- Total number of dictionary patches sampled into the reference bank.
- Larger values improve coverage but increase memory use and lookup cost.
top_k- Number of nearest dictionary patches mixed together for square-patch matching.
- In the current setup this is usually left at
1.
dictionary_chunk_size- Chunk size used when searching the reference bank.
- This mainly trades memory for speed during cosine search.
similarity_temperature- Temperature used when turning top-k similarity scores into soft weights.
- Lower values make the projection behave more like a hard nearest-neighbor choice.
random_seed- Seed used when sampling patches from the source bank and when sampling region candidates from references.
- This is separate from the diffusion seed.
projection_start_frac- Fraction of the denoising trajectory where collage projection starts.
- Example:
0.7means projection begins in the final 30 percent of steps.
projection_end_frac- Fraction of the denoising trajectory where projection stops.
- Usually this is
1.0so projection continues to the end.
projection_every_n_steps- Applies projection only every N denoising steps.
1means project at every eligible step.
alpha_start- Blend strength at the start of the projection window.
alpha_end- Blend strength at the end of the projection window.
- Higher values force stronger source-image structure into the result.
region_candidate_count- Number of random same-size candidate windows sampled from the source bank for each region.
- Higher values improve match quality but cost more compute.
region_min_area,region_max_area- Filters region sizes in latent-cell units.
- These are useful for rejecting tiny fragments or huge regions that are not helpful to match.
region_max_bbox_h,region_max_bbox_w- Optional hard caps on region bounding-box size in latent cells.
0disables the cap.
debug_every_n_projections- Saves a region debug image every N projection events when auxiliary outputs are enabled.
preview_every_n_projections- Controls optional stored latent previews during notebook-style experimentation.
- This is mainly useful for debugging and can usually stay at
0.
felzenszwalb_scale- Main region-size knob for the default segmentation method.
- Larger values generally encourage larger merged regions, though the effect depends on the latent structure.
felzenszwalb_sigma- Smoothing applied before segmentation.
- Higher values produce smoother, less detailed region boundaries.
felzenszwalb_min_size- Minimum region size enforced by the segmenter.
- Raising this removes small regions and simplifies the collage.
threshold_min_regions,threshold_max_regions- Target range for the threshold-based segmentation method.
- The code searches for a similarity threshold that lands inside this range if possible.
threshold_connectivity- Neighborhood connectivity used during region merging.
4is more conservative;8allows diagonal connections.
threshold_similarity_low,threshold_similarity_high- Search bounds for the latent-space similarity threshold.
- These usually do not need frequent changes unless you are tuning the threshold method directly.
output_dir- Where rendered images are saved.
save_auxiliary_outputs- If
true, also saves metadata, region assignments, and debug renders. - If
false, saves only the final image.
- If
save_displayed_image- If
true, saves the sharp collage render when available. - If
false, saves the raw VAE-decoded diffusion image instead.
- If
These are solid current defaults:
projection:
region_method: "felzenszwalb"
patch_size: 1
total_patches: 20000
region_candidate_count: 128
alpha_start: 0.0
alpha_end: 0.1
projection_start_frac: 0.7
projection_end_frac: 1.0
felzenszwalb_scale: 32.0
felzenszwalb_sigma: 0.8
felzenszwalb_min_size: 8The default config does not enable any LoRA.
If you want one, set these fields in model:
lora_repolora_weight_nameembedding_filenameembedding_token
If your LoRA also needs textual inversion embeddings, render.py will load them when those fields are provided.
