imaginAIry/todo.md at 6fcf0da331a5376f70daf4d922191e665580be47

Archives/imaginAIry

Fork 0

mirror of https://github.com/brycedrennan/imaginAIry synced 2024-10-31 03:20:40 +00:00

Bryce 6fcf0da331 test: try local runner

2023-12-03 09:13:01 -08:00

8.3 KiB

Raw Blame History

Todo

v14 todo

configurable composition cutoff
rename model parameter weights
rename model_config parameter to architecture and make it case insensitive
add --size parameter that accepts strings (e.g. 256x256, 4k, uhd, 8k, etc)
detect if cuda torch missing and give better error message
add method to install correct torch version
add composition cutoff parameter
allow selection of output video format
chain multiple operations together imggen => videogen
make sure terminal output on windows doesn't suck
add karras schedule to refiners

Old Todo

Inference Performance Optimizations
Development Environment
- ✅ add tests
- ✅ set up ci (test/lint/format)
- ✅ unified pipeline (txt2img & img2img combined)
- ✅ setup parallel testing
- add docs
- 🚫 remove yaml config
- 🚫 delete more unused code
- faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
Interface improvements
- ✅ init-image at command line
- ✅ prompt expansion
- ✅ interactive cli
Image Generation Features
- ✅ add k-diffusion sampling methods
- ✅ tiling
- ✅ generation videos/gifs
- ✅ controlnet
  - scribbles input
  - segmentation input
  - mlsd input
- Attend and Excite
- Compositional Visual Generation
  - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
  - https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
- ✅ negative prompting
  - some syntax to allow it in a text string
- paint with words
  - https://github.com/cloneofsimo/paint-with-words-sd
- https://multidiffusion.github.io/
- images as actual prompts instead of just init images.
  - not directly possible due to model architecture.
  - can it just be integrated into sampler?
  - requires model fine-tuning since SD1.4 expects 77x768 text encoding input
  - https://twitter.com/Buntworthy/status/1566744186153484288
  - https://github.com/justinpinkney/stable-diffusion
  - https://github.com/LambdaLabsML/lambda-diffusers
  - https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
Image Editing
- ✅outpainting
  - https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
  - lama cleaner
- ✅ inpainting
- ✅ text based image masking
- Maskless editing
  - ✅ instruct-pix2pix
- Attention Control Methods
  - https://github.com/bloc97/CrossAttentionControl
  - https://github.com/ChenWu98/cycle-diffusion
Image Enhancement
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
  - ✅ realesrgan
  - ldm
  - https://github.com/lowfuel/progrock-stable
  - txt2imghd
  - latent scaling + reprocessing
  - stability upscaler
  - rivers have wings upscaler
  - stable super-res?
    - todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
    - todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
    - https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
- ✅ face enhancers
  - ✅ gfpgan - https://github.com/TencentARC/GFPGAN
  - ✅ codeformer - https://github.com/sczhou/CodeFormer
- ✅ image describe feature -
  - ✅ https://github.com/salesforce/BLIP
  - 🚫 CLIP brute-force prompt reconstruction
    - The accuracy of this approach is too low for me to include it in imaginAIry
    - https://github.com/rmokady/CLIP_prefix_caption
    - https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
  - https://github.com/KaiyangZhou/CoOp
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's worth the effort to support this feature
- ✅ img2img for plms
- ✅ img2img for kdiff functions
Other
- Enhancement pipelines
- text-to-3d https://dreamfusionpaper.github.io/
  - https://shihmengli.github.io/3D-Photo-Inpainting/
  - https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
  - Depth estimation
    - what is SOTA for monocular depth estimation?
    - https://github.com/compphoto/BoostingMonocularDepth
- make a video https://github.com/lucidrains/make-a-video-pytorch
- animations
- guided generation
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
- textual inversion
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
- ✅ deploy to pypi
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
Training
- Finetuning "dreambooth" style
- Textual Inversion
  - Fast Textual Inversion
- Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)
  - https://huggingface.co/spaces/lora-library/Low-rank-Adaptation
- Performance Improvements
- ColoassalAI - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
- Xformers
- Deepspeed

8.3 KiB Raw Blame History

Todo

v14 todo

Old Todo

8.3 KiB

Raw Blame History