- adds support for (SDXL)[https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0] - adds sliced encoding/decoding to refiners sdxl pipeline - doesn't support inpainting, controlnets - monkeypatches self_attention_guidance to use sliced attention - adds a bunch of model weight translation utilities and weightmaps - add [opendalle 1.1](https://huggingface.co/dataautogpt3/OpenDalleV1.1) - change default model to opendalle - fix: better handle special characters in path inputs on command line **todo** - add tests
9.2 KiB
Todo
v14 todo
-
✅ configurable composition cutoff
-
✅ rename model parameter weights
-
✅ rename model_config parameter to architecture and make it case insensitive
-
✅ add --size parameter that accepts strings (e.g. 256x256, 4k, uhd, 8k, etc)
-
✅ detect if cuda torch missing and give better error message
-
✅ add method to install correct torch version
-
✅ make cli run faster again
-
✅ add tests for cli commands
-
✅ add type checker
-
✅ add interface for loading diffusers weights
-
✅ SDXL support
-
sdxl inpainting
-
t2i adapters
-
embedding inputs
-
only output the main image unless some flag is set
-
allow selection of output video format
-
chain multiple operations together imggen => videogen
-
https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic
-
make sure terminal output on windows doesn't suck
-
add method to show cache size
-
add method to clear model cache
-
add method to clear cached items not recently used (does diffusers have one?)
-
create actual documentation
Investigate
- textdiffusers https://jingyechen.github.io/textdiffuser2/
- Scalecrafter https://yingqinghe.github.io/scalecrafter/
- Fast diffusion with LCM Lora https://huggingface.co/latent-consistency/lcm-lora-sdv1-5/tree/main
- 3d diffusion https://huggingface.co/stabilityai/stable-zero123
- magic animate
- consistency decoder
- https://github.com/XPixelGroup/HAT
Old Todo
-
Inference Performance Optimizations
- ✅ fp16
- ✅ Doggettx Sliced attention
- ✅ xformers support https://www.photoroom.com/tech/stable-diffusion-100-percent-faster-with-memory-efficient-attention/
- https://github.com/neonsecret/stable-diffusion
- https://github.com/CompVis/stable-diffusion/pull/177
- https://github.com/huggingface/diffusers/pull/532/files
- https://github.com/HazyResearch/flash-attention
- https://github.com/chavinlo/sda-node
- https://github.com/AminRezaei0x443/memory-efficient-attention/issues/7
-
Development Environment
- ✅ add tests
- ✅ set up ci (test/lint/format)
- ✅ unified pipeline (txt2img & img2img combined)
- ✅ setup parallel testing
- add docs
- 🚫 remove yaml config
- 🚫 delete more unused code
- faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
-
Interface improvements
- ✅ init-image at command line
- ✅ prompt expansion
- ✅ interactive cli
-
Image Generation Features
- ✅ add k-diffusion sampling methods
- ✅ tiling
- ✅ generation videos/gifs
- ✅ controlnet
- scribbles input
- segmentation input
- mlsd input
- Attend and Excite
- Compositional Visual Generation
- ✅ negative prompting
- some syntax to allow it in a text string
- paint with words
- https://multidiffusion.github.io/
- images as actual prompts instead of just init images.
- not directly possible due to model architecture.
- can it just be integrated into sampler?
- requires model fine-tuning since SD1.4 expects 77x768 text encoding input
- https://twitter.com/Buntworthy/status/1566744186153484288
- https://github.com/justinpinkney/stable-diffusion
- https://github.com/LambdaLabsML/lambda-diffusers
- https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
-
Image Editing
- ✅outpainting
- ✅ inpainting
- https://github.com/Jack000/glid-3-xl-stable
- https://github.com/andreas128/RePaint
- ✅ img2img but keeps img stable
- https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
- https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1
bbb5298146
- https://github.com/SHI-Labs/FcF-Inpainting https://praeclarumjj3.github.io/fcf-inpainting/
- ✅ text based image masking
- Maskless editing
- ✅ instruct-pix2pix
- Attention Control Methods
-
Image Enhancement
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
- ✅ realesrgan
- ldm
- https://github.com/lowfuel/progrock-stable
- txt2imghd
- latent scaling + reprocessing
- stability upscaler
- rivers have wings upscaler
- stable super-res?
- todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
- todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
- https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
- ✅ face enhancers
- ✅ gfpgan - https://github.com/TencentARC/GFPGAN
- ✅ codeformer - https://github.com/sczhou/CodeFormer
- ✅ image describe feature -
- ✅ https://github.com/salesforce/BLIP
- 🚫 CLIP brute-force prompt reconstruction
- The accuracy of this approach is too low for me to include it in imaginAIry
- https://github.com/rmokady/CLIP_prefix_caption
- https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
- https://github.com/KaiyangZhou/CoOp
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's worth the effort to support this feature
- ✅ img2img for plms
- ✅ img2img for kdiff functions
-
Other
- Enhancement pipelines
- text-to-3d https://dreamfusionpaper.github.io/
- https://shihmengli.github.io/3D-Photo-Inpainting/
- https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
- Depth estimation
- what is SOTA for monocular depth estimation?
- https://github.com/compphoto/BoostingMonocularDepth
- make a video https://github.com/lucidrains/make-a-video-pytorch
- animations
- guided generation
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
- textual inversion
- https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
- https://github.com/Jack000/glid-3-xl-stable
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
- ✅ deploy to pypi
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
-
Training
- Finetuning "dreambooth" style
- Textual Inversion
- Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)
- Performance Improvements
- ColoassalAI - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
- Xformers
- Deepspeed