mirror of
https://github.com/brycedrennan/imaginAIry
synced 2024-10-31 03:20:40 +00:00
42a045e8e6
- adds support for (SDXL)[https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0] - adds sliced encoding/decoding to refiners sdxl pipeline - doesn't support inpainting, controlnets - monkeypatches self_attention_guidance to use sliced attention - adds a bunch of model weight translation utilities and weightmaps - add [opendalle 1.1](https://huggingface.co/dataautogpt3/OpenDalleV1.1) - change default model to opendalle - fix: better handle special characters in path inputs on command line **todo** - add tests
182 lines
9.2 KiB
Markdown
182 lines
9.2 KiB
Markdown
## Todo
|
|
|
|
### v14 todo
|
|
|
|
- ✅ configurable composition cutoff
|
|
- ✅ rename model parameter weights
|
|
- ✅ rename model_config parameter to architecture and make it case insensitive
|
|
- ✅ add --size parameter that accepts strings (e.g. 256x256, 4k, uhd, 8k, etc)
|
|
- ✅ detect if cuda torch missing and give better error message
|
|
- ✅ add method to install correct torch version
|
|
- ✅ make cli run faster again
|
|
- ✅ add tests for cli commands
|
|
- ✅ add type checker
|
|
- ✅ add interface for loading diffusers weights
|
|
- ✅ SDXL support
|
|
- sdxl inpainting
|
|
- t2i adapters
|
|
- embedding inputs
|
|
- only output the main image unless some flag is set
|
|
- allow selection of output video format
|
|
- chain multiple operations together imggen => videogen
|
|
- https://github.com/pallets/click/tree/main/examples/imagepipe
|
|
|
|
|
|
- https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic
|
|
- make sure terminal output on windows doesn't suck
|
|
- add method to show cache size
|
|
- add method to clear model cache
|
|
- add method to clear cached items not recently used (does diffusers have one?)
|
|
- create actual documentation
|
|
|
|
|
|
#### Investigate
|
|
- textdiffusers https://jingyechen.github.io/textdiffuser2/
|
|
- Scalecrafter https://yingqinghe.github.io/scalecrafter/
|
|
- Fast diffusion with LCM Lora https://huggingface.co/latent-consistency/lcm-lora-sdv1-5/tree/main
|
|
- 3d diffusion https://huggingface.co/stabilityai/stable-zero123
|
|
- magic animate
|
|
- consistency decoder
|
|
- https://github.com/XPixelGroup/HAT
|
|
|
|
### Old Todo
|
|
|
|
- Inference Performance Optimizations
|
|
- ✅ fp16
|
|
- ✅ [Doggettx Sliced attention](https://github.com/CompVis/stable-diffusion/compare/main...Doggettx:stable-diffusion:autocast-improvements#)
|
|
- ✅ xformers support https://www.photoroom.com/tech/stable-diffusion-100-percent-faster-with-memory-efficient-attention/
|
|
- https://github.com/neonsecret/stable-diffusion
|
|
- https://github.com/CompVis/stable-diffusion/pull/177
|
|
- https://github.com/huggingface/diffusers/pull/532/files
|
|
- https://github.com/HazyResearch/flash-attention
|
|
- https://github.com/chavinlo/sda-node
|
|
- https://github.com/AminRezaei0x443/memory-efficient-attention/issues/7
|
|
|
|
- Development Environment
|
|
- ✅ add tests
|
|
- ✅ set up ci (test/lint/format)
|
|
- ✅ unified pipeline (txt2img & img2img combined)
|
|
- ✅ setup parallel testing
|
|
- add docs
|
|
- 🚫 remove yaml config
|
|
- 🚫 delete more unused code
|
|
- faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
|
|
- Interface improvements
|
|
- ✅ init-image at command line
|
|
- ✅ prompt expansion
|
|
- ✅ interactive cli
|
|
- Image Generation Features
|
|
- ✅ add k-diffusion sampling methods
|
|
- ✅ tiling
|
|
- ✅ generation videos/gifs
|
|
- ✅ controlnet
|
|
- scribbles input
|
|
- segmentation input
|
|
- mlsd input
|
|
- [Attend and Excite](https://attendandexcite.github.io/Attend-and-Excite/)
|
|
- Compositional Visual Generation
|
|
- https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
|
|
- https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
|
|
- ✅ negative prompting
|
|
- some syntax to allow it in a text string
|
|
- [paint with words](https://www.reddit.com/r/StableDiffusion/comments/10lzgze/i_figured_out_a_way_to_apply_different_prompts_to/)
|
|
- https://github.com/cloneofsimo/paint-with-words-sd
|
|
- https://multidiffusion.github.io/
|
|
- images as actual prompts instead of just init images.
|
|
- not directly possible due to model architecture.
|
|
- can it just be integrated into sampler?
|
|
- requires model fine-tuning since SD1.4 expects 77x768 text encoding input
|
|
- https://twitter.com/Buntworthy/status/1566744186153484288
|
|
- https://github.com/justinpinkney/stable-diffusion
|
|
- https://github.com/LambdaLabsML/lambda-diffusers
|
|
- https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
|
|
- Image Editing
|
|
- ✅outpainting
|
|
- https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
|
|
- lama cleaner
|
|
- ✅ inpainting
|
|
- https://github.com/Jack000/glid-3-xl-stable
|
|
- https://github.com/andreas128/RePaint
|
|
- ✅ img2img but keeps img stable
|
|
- https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
|
|
- https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1
|
|
- https://github.com/pesser/stable-diffusion/commit/bbb52981460707963e2a62160890d7ecbce00e79
|
|
- https://github.com/SHI-Labs/FcF-Inpainting https://praeclarumjj3.github.io/fcf-inpainting/
|
|
- ✅ text based image masking
|
|
- ✅ ClipSeg - https://github.com/timojl/clipseg
|
|
- https://github.com/facebookresearch/detectron2
|
|
- https://x-decoder-vl.github.io/
|
|
- Maskless editing
|
|
- ✅ instruct-pix2pix
|
|
-
|
|
- Attention Control Methods
|
|
- https://github.com/bloc97/CrossAttentionControl
|
|
- https://github.com/ChenWu98/cycle-diffusion
|
|
- Image Enhancement
|
|
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
|
|
- Upscaling
|
|
- ✅ realesrgan
|
|
- ldm
|
|
- https://github.com/lowfuel/progrock-stable
|
|
- [txt2imghd](https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py)
|
|
- latent scaling + reprocessing
|
|
- stability upscaler
|
|
- rivers have wings upscaler
|
|
- stable super-res?
|
|
- todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
|
|
- todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
|
|
- https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
|
|
- ✅ face enhancers
|
|
- ✅ gfpgan - https://github.com/TencentARC/GFPGAN
|
|
- ✅ codeformer - https://github.com/sczhou/CodeFormer
|
|
- ✅ image describe feature -
|
|
- ✅ https://github.com/salesforce/BLIP
|
|
- 🚫 CLIP brute-force prompt reconstruction
|
|
- The accuracy of this approach is too low for me to include it in imaginAIry
|
|
- https://github.com/rmokady/CLIP_prefix_caption
|
|
- https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
|
|
- https://github.com/KaiyangZhou/CoOp
|
|
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's
|
|
worth the effort to support this feature
|
|
- ✅ img2img for plms
|
|
- ✅ img2img for kdiff functions
|
|
- Other
|
|
- Enhancement pipelines
|
|
- text-to-3d https://dreamfusionpaper.github.io/
|
|
- https://shihmengli.github.io/3D-Photo-Inpainting/
|
|
- https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
|
|
- Depth estimation
|
|
- what is SOTA for monocular depth estimation?
|
|
- https://github.com/compphoto/BoostingMonocularDepth
|
|
- make a video https://github.com/lucidrains/make-a-video-pytorch
|
|
- animations
|
|
- https://github.com/francislabountyjr/stable-diffusion/blob/main/inferencing_notebook.ipynb
|
|
- https://www.youtube.com/watch?v=E7aAFEhdngI
|
|
- https://github.com/pytti-tools/frame-interpolation
|
|
- guided generation
|
|
- https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1#scrollTo=UDeXQKbPTdZI
|
|
- https://colab.research.google.com/github/aicrumb/doohickey/blob/main/Doohickey_Diffusion.ipynb#scrollTo=PytCwKXCmPid
|
|
- https://github.com/mlfoundations/open_clip
|
|
- https://github.com/openai/guided-diffusion
|
|
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
|
|
- textual inversion
|
|
- https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/
|
|
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h
|
|
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
|
|
- https://github.com/Jack000/glid-3-xl-stable
|
|
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
|
|
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
|
|
- ✅ deploy to pypi
|
|
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
|
|
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
|
|
- Training
|
|
- Finetuning "dreambooth" style
|
|
- [Textual Inversion](https://arxiv.org/abs/2208.01618)
|
|
- [Fast Textual Inversion](https://github.com/peterwilli/sd-leap-booster)
|
|
- [Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)](https://github.com/cloneofsimo/lora)
|
|
- https://huggingface.co/spaces/lora-library/Low-rank-Adaptation
|
|
- Performance Improvements
|
|
- [ColoassalAI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
|
|
- Xformers
|
|
- Deepspeed
|
|
- |