imaginAIry/todo.md at 2372a71e6c12cb77a137dbb13217cdfb0612b794

Archives/imaginAIry

Fork 0

mirror of https://github.com/brycedrennan/imaginAIry synced 2024-10-31 03:20:40 +00:00

Bryce 7880ee1389 feature: update midas (depth maps)

2023-12-18 13:01:56 -08:00

8.7 KiB

Raw Blame History

Todo

v14 todo

✅ configurable composition cutoff
✅ rename model parameter weights
✅ rename model_config parameter to architecture and make it case insensitive
✅ add --size parameter that accepts strings (e.g. 256x256, 4k, uhd, 8k, etc)
✅ detect if cuda torch missing and give better error message
✅ add method to install correct torch version
✅ make cli run faster again
✅ add tests for cli commands
✅ add type checker
only output the main image unless some flag is set
allow selection of output video format
chain multiple operations together imggen => videogen
- https://github.com/pallets/click/tree/main/examples/imagepipe
add interface for loading diffusers weights
https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic
make sure terminal output on windows doesn't suck
add method to show cache size
add method to clear model cache
add method to clear cached items not recently used (does diffusers have one?)

Old Todo

Inference Performance Optimizations
Development Environment
- ✅ add tests
- ✅ set up ci (test/lint/format)
- ✅ unified pipeline (txt2img & img2img combined)
- ✅ setup parallel testing
- add docs
- 🚫 remove yaml config
- 🚫 delete more unused code
- faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
Interface improvements
- ✅ init-image at command line
- ✅ prompt expansion
- ✅ interactive cli
Image Generation Features
- ✅ add k-diffusion sampling methods
- ✅ tiling
- ✅ generation videos/gifs
- ✅ controlnet
  - scribbles input
  - segmentation input
  - mlsd input
- Attend and Excite
- Compositional Visual Generation
  - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
  - https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
- ✅ negative prompting
  - some syntax to allow it in a text string
- paint with words
  - https://github.com/cloneofsimo/paint-with-words-sd
- https://multidiffusion.github.io/
- images as actual prompts instead of just init images.
  - not directly possible due to model architecture.
  - can it just be integrated into sampler?
  - requires model fine-tuning since SD1.4 expects 77x768 text encoding input
  - https://twitter.com/Buntworthy/status/1566744186153484288
  - https://github.com/justinpinkney/stable-diffusion
  - https://github.com/LambdaLabsML/lambda-diffusers
  - https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
Image Editing
- ✅outpainting
  - https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
  - lama cleaner
- ✅ inpainting
- ✅ text based image masking
- Maskless editing
  - ✅ instruct-pix2pix
- Attention Control Methods
  - https://github.com/bloc97/CrossAttentionControl
  - https://github.com/ChenWu98/cycle-diffusion
Image Enhancement
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
  - ✅ realesrgan
  - ldm
  - https://github.com/lowfuel/progrock-stable
  - txt2imghd
  - latent scaling + reprocessing
  - stability upscaler
  - rivers have wings upscaler
  - stable super-res?
    - todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
    - todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
    - https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
- ✅ face enhancers
  - ✅ gfpgan - https://github.com/TencentARC/GFPGAN
  - ✅ codeformer - https://github.com/sczhou/CodeFormer
- ✅ image describe feature -
  - ✅ https://github.com/salesforce/BLIP
  - 🚫 CLIP brute-force prompt reconstruction
    - The accuracy of this approach is too low for me to include it in imaginAIry
    - https://github.com/rmokady/CLIP_prefix_caption
    - https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
  - https://github.com/KaiyangZhou/CoOp
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's worth the effort to support this feature
- ✅ img2img for plms
- ✅ img2img for kdiff functions
Other
- Enhancement pipelines
- text-to-3d https://dreamfusionpaper.github.io/
  - https://shihmengli.github.io/3D-Photo-Inpainting/
  - https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
  - Depth estimation
    - what is SOTA for monocular depth estimation?
    - https://github.com/compphoto/BoostingMonocularDepth
- make a video https://github.com/lucidrains/make-a-video-pytorch
- animations
- guided generation
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
- textual inversion
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
- ✅ deploy to pypi
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
Training
- Finetuning "dreambooth" style
- Textual Inversion
  - Fast Textual Inversion
- Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)
  - https://huggingface.co/spaces/lora-library/Low-rank-Adaptation
- Performance Improvements
- ColoassalAI - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
- Xformers
- Deepspeed

8.7 KiB Raw Blame History

Todo

v14 todo

Old Todo

8.7 KiB

Raw Blame History