## Todo ### v14 todo - ✅ configurable composition cutoff - ✅ rename model parameter weights - ✅ rename model_config parameter to architecture and make it case insensitive - ✅ add --size parameter that accepts strings (e.g. 256x256, 4k, uhd, 8k, etc) - ✅ detect if cuda torch missing and give better error message - ✅ add method to install correct torch version - ✅ make cli run faster again - ✅ add tests for cli commands - ✅ add type checker - ✅ add interface for loading diffusers weights - ✅ SDXL support - ✅ sdxl inpainting - t2i adapters - image prompts - embedding inputs - save complete metadata to image - recreate image from metadata - auto-incoporate https://huggingface.co/madebyollin/sdxl-vae-fp16-fix - only output the main image unless some flag is set - ✅ allow selection of output video format - test on python 3.11 - allow specification of filename format - chain multiple operations together imggen => videogen - https://github.com/pallets/click/tree/main/examples/imagepipe - - https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic - make sure terminal output on windows doesn't suck - add method to show cache size - add method to clear model cache - add method to clear cached items not recently used (does diffusers have one?) - create actual documentation #### Investigate - use fancy noise https://github.com/Extraltodeus/noise_latent_perlinpinpin - use latent upscaler https://github.com/city96/SD-Latent-Upscaler - use latent interposer https://github.com/city96/SD-Latent-Interposer/tree/main - https://github.com/madebyollin/taesd - textdiffusers https://jingyechen.github.io/textdiffuser2/ - Fast diffusion with LCM Lora https://huggingface.co/latent-consistency/lcm-lora-sdv1-5/tree/main - 3d diffusion https://huggingface.co/stabilityai/stable-zero123 - magic animate - consistency decoder - https://github.com/XPixelGroup/HAT ### Old Todo - Inference Performance Optimizations - ✅ fp16 - ✅ [Doggettx Sliced attention](https://github.com/CompVis/stable-diffusion/compare/main...Doggettx:stable-diffusion:autocast-improvements#) - ✅ xformers support https://www.photoroom.com/tech/stable-diffusion-100-percent-faster-with-memory-efficient-attention/ - https://github.com/neonsecret/stable-diffusion - https://github.com/CompVis/stable-diffusion/pull/177 - https://github.com/huggingface/diffusers/pull/532/files - https://github.com/HazyResearch/flash-attention - https://github.com/chavinlo/sda-node - https://github.com/AminRezaei0x443/memory-efficient-attention/issues/7 - Development Environment - ✅ add tests - ✅ set up ci (test/lint/format) - ✅ unified pipeline (txt2img & img2img combined) - ✅ setup parallel testing - ✅ add docs - 🚫 remove yaml config - 🚫 delete more unused code - faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9 - Interface improvements - ✅ init-image at command line - ✅ prompt expansion - ✅ interactive cli - Image Generation Features - ✅ add k-diffusion sampling methods - ✅ tiling - ✅ generation videos/gifs - ✅ controlnet - scribbles input - segmentation input - mlsd input - [Attend and Excite](https://attendandexcite.github.io/Attend-and-Excite/) - Compositional Visual Generation - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch - https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS - ✅ negative prompting - some syntax to allow it in a text string - [paint with words](https://www.reddit.com/r/StableDiffusion/comments/10lzgze/i_figured_out_a_way_to_apply_different_prompts_to/) - https://github.com/cloneofsimo/paint-with-words-sd - https://multidiffusion.github.io/ - images as actual prompts instead of just init images. - not directly possible due to model architecture. - can it just be integrated into sampler? - requires model fine-tuning since SD1.4 expects 77x768 text encoding input - https://twitter.com/Buntworthy/status/1566744186153484288 - https://github.com/justinpinkney/stable-diffusion - https://github.com/LambdaLabsML/lambda-diffusers - https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/ - Image Editing - ✅outpainting - https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues - lama cleaner - ✅ inpainting - https://github.com/Jack000/glid-3-xl-stable - https://github.com/andreas128/RePaint - ✅ img2img but keeps img stable - https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/ - https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1 - https://github.com/pesser/stable-diffusion/commit/bbb52981460707963e2a62160890d7ecbce00e79 - https://github.com/SHI-Labs/FcF-Inpainting https://praeclarumjj3.github.io/fcf-inpainting/ - ✅ text based image masking - ✅ ClipSeg - https://github.com/timojl/clipseg - https://github.com/facebookresearch/detectron2 - https://x-decoder-vl.github.io/ - Maskless editing - ✅ instruct-pix2pix - - Attention Control Methods - https://github.com/bloc97/CrossAttentionControl - https://github.com/ChenWu98/cycle-diffusion - Image Enhancement - Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life - Upscaling - ✅ realesrgan - ldm - https://github.com/lowfuel/progrock-stable - [txt2imghd](https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py) - latent scaling + reprocessing - stability upscaler - rivers have wings upscaler - stable super-res? - todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step) - todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step - https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/ - ✅ face enhancers - ✅ gfpgan - https://github.com/TencentARC/GFPGAN - ✅ codeformer - https://github.com/sczhou/CodeFormer - ✅ image describe feature - - ✅ https://github.com/salesforce/BLIP - 🚫 CLIP brute-force prompt reconstruction - The accuracy of this approach is too low for me to include it in imaginAIry - https://github.com/rmokady/CLIP_prefix_caption - https://github.com/pharmapsychotic/clip-interrogator (blip + clip) - https://github.com/KaiyangZhou/CoOp - 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's worth the effort to support this feature - ✅ img2img for plms - ✅ img2img for kdiff functions - Other - Enhancement pipelines - text-to-3d https://dreamfusionpaper.github.io/ - https://shihmengli.github.io/3D-Photo-Inpainting/ - https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50 - Depth estimation - what is SOTA for monocular depth estimation? - https://github.com/compphoto/BoostingMonocularDepth - make a video https://github.com/lucidrains/make-a-video-pytorch - animations - https://github.com/francislabountyjr/stable-diffusion/blob/main/inferencing_notebook.ipynb - https://www.youtube.com/watch?v=E7aAFEhdngI - https://github.com/pytti-tools/frame-interpolation - guided generation - https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1#scrollTo=UDeXQKbPTdZI - https://colab.research.google.com/github/aicrumb/doohickey/blob/main/Doohickey_Diffusion.ipynb#scrollTo=PytCwKXCmPid - https://github.com/mlfoundations/open_clip - https://github.com/openai/guided-diffusion - image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md - textual inversion - https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/ - https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h - https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb - https://github.com/Jack000/glid-3-xl-stable - fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/ - https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/ - ✅ deploy to pypi - find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false - https://github.com/vicgalle/stable-diffusion-aesthetic-gradients - Training - Finetuning "dreambooth" style - [Textual Inversion](https://arxiv.org/abs/2208.01618) - [Fast Textual Inversion](https://github.com/peterwilli/sd-leap-booster) - [Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)](https://github.com/cloneofsimo/lora) - https://huggingface.co/spaces/lora-library/Low-rank-Adaptation - Performance Improvements - [ColoassalAI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this. - Xformers - Deepspeed ## Decided against - Scalecrafter https://yingqinghe.github.io/scalecrafter/ - doesn't look any better than img2img