imaginAIry/todo.md at f866aaa44d0f5faebf8cd2bed9f5fc652ef507b5

mirror of https://github.com/brycedrennan/imaginAIry synced 2024-10-31 03:20:40 +00:00

- adds support for (SDXL)[https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0]
  - adds sliced encoding/decoding to refiners sdxl pipeline
  - doesn't support inpainting, controlnets
- monkeypatches self_attention_guidance to use sliced attention
- adds a bunch of model weight translation utilities and weightmaps
- add [opendalle 1.1](https://huggingface.co/dataautogpt3/OpenDalleV1.1)
- change default model to opendalle
- fix: better handle special characters in path inputs on command line
**todo**
- add tests

2023-12-27 21:52:37 -08:00

9.2 KiB

Raw Blame History

Todo

v14 todo

✅ configurable composition cutoff
✅ rename model parameter weights
✅ rename model_config parameter to architecture and make it case insensitive
✅ add --size parameter that accepts strings (e.g. 256x256, 4k, uhd, 8k, etc)
✅ detect if cuda torch missing and give better error message
✅ add method to install correct torch version
✅ make cli run faster again
✅ add tests for cli commands
✅ add type checker
✅ add interface for loading diffusers weights
✅ SDXL support
sdxl inpainting
t2i adapters
embedding inputs
only output the main image unless some flag is set
allow selection of output video format
chain multiple operations together imggen => videogen
- https://github.com/pallets/click/tree/main/examples/imagepipe
https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic
make sure terminal output on windows doesn't suck
add method to show cache size
add method to clear model cache
add method to clear cached items not recently used (does diffusers have one?)
create actual documentation

Investigate

textdiffusers https://jingyechen.github.io/textdiffuser2/
Scalecrafter https://yingqinghe.github.io/scalecrafter/
Fast diffusion with LCM Lora https://huggingface.co/latent-consistency/lcm-lora-sdv1-5/tree/main
3d diffusion https://huggingface.co/stabilityai/stable-zero123
magic animate
consistency decoder
https://github.com/XPixelGroup/HAT

Old Todo

Inference Performance Optimizations
Development Environment
- ✅ add tests
- ✅ set up ci (test/lint/format)
- ✅ unified pipeline (txt2img & img2img combined)
- ✅ setup parallel testing
- add docs
- 🚫 remove yaml config
- 🚫 delete more unused code
- faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
Interface improvements
- ✅ init-image at command line
- ✅ prompt expansion
- ✅ interactive cli
Image Generation Features
- ✅ add k-diffusion sampling methods
- ✅ tiling
- ✅ generation videos/gifs
- ✅ controlnet
  - scribbles input
  - segmentation input
  - mlsd input
- Attend and Excite
- Compositional Visual Generation
  - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
  - https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
- ✅ negative prompting
  - some syntax to allow it in a text string
- paint with words
  - https://github.com/cloneofsimo/paint-with-words-sd
- https://multidiffusion.github.io/
- images as actual prompts instead of just init images.
  - not directly possible due to model architecture.
  - can it just be integrated into sampler?
  - requires model fine-tuning since SD1.4 expects 77x768 text encoding input
  - https://twitter.com/Buntworthy/status/1566744186153484288
  - https://github.com/justinpinkney/stable-diffusion
  - https://github.com/LambdaLabsML/lambda-diffusers
  - https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
Image Editing
- ✅outpainting
  - https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
  - lama cleaner
- ✅ inpainting
- ✅ text based image masking
- Maskless editing
  - ✅ instruct-pix2pix
- Attention Control Methods
  - https://github.com/bloc97/CrossAttentionControl
  - https://github.com/ChenWu98/cycle-diffusion
Image Enhancement
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
  - ✅ realesrgan
  - ldm
  - https://github.com/lowfuel/progrock-stable
  - txt2imghd
  - latent scaling + reprocessing
  - stability upscaler
  - rivers have wings upscaler
  - stable super-res?
    - todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
    - todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
    - https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
- ✅ face enhancers
  - ✅ gfpgan - https://github.com/TencentARC/GFPGAN
  - ✅ codeformer - https://github.com/sczhou/CodeFormer
- ✅ image describe feature -
  - ✅ https://github.com/salesforce/BLIP
  - 🚫 CLIP brute-force prompt reconstruction
    - The accuracy of this approach is too low for me to include it in imaginAIry
    - https://github.com/rmokady/CLIP_prefix_caption
    - https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
  - https://github.com/KaiyangZhou/CoOp
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's worth the effort to support this feature
- ✅ img2img for plms
- ✅ img2img for kdiff functions
Other
- Enhancement pipelines
- text-to-3d https://dreamfusionpaper.github.io/
  - https://shihmengli.github.io/3D-Photo-Inpainting/
  - https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
  - Depth estimation
    - what is SOTA for monocular depth estimation?
    - https://github.com/compphoto/BoostingMonocularDepth
- make a video https://github.com/lucidrains/make-a-video-pytorch
- animations
- guided generation
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
- textual inversion
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
- ✅ deploy to pypi
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
Training
- Finetuning "dreambooth" style
- Textual Inversion
  - Fast Textual Inversion
- Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)
  - https://huggingface.co/spaces/lora-library/Low-rank-Adaptation
- Performance Improvements
- ColoassalAI - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
- Xformers
- Deepspeed

9.2 KiB Raw Blame History

Todo

v14 todo

Investigate

Old Todo

9.2 KiB

Raw Blame History