feature: add art-scene, desktop-background, interior-style, painting-style phraselists

fix: file globbing works in the interactive shell
feature: make compilation animations simple slide shows
pull/288/head
Bryce 1 year ago committed by Bryce Drennan
parent de2a72f718
commit eb26d5a7c5

@ -31,65 +31,60 @@ AI imagined images. Pythonic generation of stable diffusion images.
### Image Structure Control [by ControlNet](https://github.com/lllyasviel/ControlNet)
Generate images guided by body poses, depth maps, canny edges, hed boundaries, or normal maps.
<details>
<summary>Openpose Control</summary>
**Openpose Control**
```bash
imagine --control-image assets/indiana.jpg --control-mode openpose --caption-text openpose "photo of a polar bear"
```
</details>
<p float="left">
<img src="assets/indiana.jpg" height="256">
<img src="assets/indiana-pose.jpg" height="256">
<img src="assets/indiana-pose-polar-bear.jpg" height="256">
</p>
<details>
<summary>Canny Edge Control</summary>
**Canny Edge Control**
```bash
imagine --control-image assets/lena.png --control-mode canny --caption-text canny "photo of a woman with a hat looking at the camera"
imagine --control-image assets/lena.png --control-mode canny "photo of a woman with a hat looking at the camera"
```
</details>
<p float="left">
<img src="assets/lena.png" height="256">
<img src="assets/lena-canny.jpg" height="256">
<img src="assets/lena-canny-generated.jpg" height="256">
</p>
<details>
<summary>HED Boundary Control</summary>
**HED Boundary Control**
```bash
imagine --control-image dog.jpg --control-mode hed "photo of a dalmation"
```
</details>
<p float="left">
<img src="assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg" height="256">
<img src="assets/dog-hed-boundary.jpg" height="256">
<img src="assets/dog-hed-boundary-dalmation.jpg" height="256">
</p>
<details>
<summary>Depth Map Control</summary>
**Depth Map Control**
```bash
imagine --control-image fancy-living.jpg --control-mode depth "a modern living room"
```
</details>
<p float="left">
<img src="assets/fancy-living.jpg" height="256">
<img src="assets/fancy-living-depth.jpg" height="256">
<img src="assets/fancy-living-depth-generated.jpg" height="256">
</p>
<details>
<summary>Normal Map Control</summary>
**Normal Map Control**
```bash
imagine --control-image bird.jpg --control-mode normal "a bird"
```
</details>
<p float="left">
<img src="assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg" height="256">
<img src="assets/bird-normal.jpg" height="256">
@ -208,6 +203,8 @@ When writing strength modifiers keep in mind that pixel values are between 0 and
### Upscaling [by RealESRGAN](https://github.com/xinntao/Real-ESRGAN)
```bash
>> imagine "colorful smoke" --steps 40 --upscale
# upscale an existing image
>> aimg upscale my-image.jpg
```
<img src="https://github.com/brycedrennan/imaginAIry/raw/master/assets/000206_856637805_PLMS40_PS7.5_colorful_smoke.jpg" height="128"> ➡️
<img src="https://github.com/brycedrennan/imaginAIry/raw/master/assets/000206_856637805_PLMS40_PS7.5_colorful_smoke_upscaled.jpg" height="256">
@ -276,12 +273,12 @@ You can use `{}` to randomly pull values from lists. A list of values separated
pull from a phrase list of the same name. Folders containing .txt phraselist files may be specified via
`--prompt_library_path`. The option may be specified multiple times. Built-in categories:
3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-movement,
3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-scene, art-movement,
art-site, artist, artist-botanical, artist-surreal, aspect-ratio, bird, body-of-water, body-pose, camera-brand,
camera-model, color, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, dinosaur, eyecolor, f-stop,
camera-model, color, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, desktop-background, dinosaur, eyecolor, f-stop,
fantasy-creature, fantasy-setting, fish, flower, focal-length, food, fruit, games, gen-modifier, hair, hd,
iso-stop, landscape-type, national-park, nationality, neg-weight, noun-beauty, noun-fantasy, noun-general,
noun-horror, occupation, photo-term, pop-culture, pop-location, punk-style, quantity, rpg-item, scenario-desc,
noun-horror, occupation, painting-style, photo-term, pop-culture, pop-location, punk-style, quantity, rpg-item, scenario-desc,
skin-color, spaceship, style, tree-species, trippy, world-heritage-site
Examples:
@ -305,6 +302,26 @@ You can use `{}` to randomly pull values from lists. A list of values separated
a bowl full of gold bars sitting on a table
```
### Example Use Cases
```bash
>> aimg
# Generate endless 8k art
🤖🧠> imagine -w 1920 -h 1080 --upscale "{_art-scene_}. {_painting-style_} by {_artist_}" -r 1000 --steps 30 --model sd21v
# generate endless desktop backgrounds
🤖🧠> imagine --tile "{_desktop-background_}" -r 100
# convert a folder of images to pencil sketches
🤖🧠> edit other/images/*.jpg -p "make it a pencil sketch"
# upscale a folder of images
🤖🧠> upscale my-images/*.jpg
# generate kitchen remodel ideas
🤖🧠> imagine --control-image kitchen.jpg -w 1024 -h 1024 "{_interior-style_} kitchen" --control-mode depth -r 100 --init-image 0.01 --upscale --steps 35 --caption-text "{prompt}"
```
### Additional Features
- Generate images either in code or from command line.
- It just works. Proper requirements are installed. Model weights are automatically downloaded. No huggingface account needed.
@ -381,6 +398,12 @@ docker run -it --gpus all -v $HOME/.cache/huggingface:/root/.cache/huggingface -
## ChangeLog
- docs: add some example use cases
- feature: add art-scene, desktop-background, interior-style, painting-style phraselists
- fix: compilation animations create normal slideshows instead of "bounces"
- fix: file globbing works in the interactive shell
**11.0.0**
- all these changes together mean same seed/sampler will not be guaranteed to produce same image (thus the version bump)
- fix: image composition didn't work very well. Works well now but probably very slow on non-cuda platforms
@ -655,175 +678,4 @@ would be uncorrelated to the rest of the surrounding image. It created terrible
- a GUI. this is a python library
- exploratory features that don't work well
## Todo
- Inference Performance Optimizations
- ✅ fp16
- ✅ [Doggettx Sliced attention](https://github.com/CompVis/stable-diffusion/compare/main...Doggettx:stable-diffusion:autocast-improvements#)
- ✅ xformers support https://www.photoroom.com/tech/stable-diffusion-100-percent-faster-with-memory-efficient-attention/
- https://github.com/neonsecret/stable-diffusion
- https://github.com/CompVis/stable-diffusion/pull/177
- https://github.com/huggingface/diffusers/pull/532/files
- https://github.com/HazyResearch/flash-attention
- https://github.com/chavinlo/sda-node
- https://github.com/AminRezaei0x443/memory-efficient-attention/issues/7
- Development Environment
- ✅ add tests
- ✅ set up ci (test/lint/format)
- ✅ unified pipeline (txt2img & img2img combined)
- ✅ setup parallel testing
- add docs
- 🚫 remove yaml config
- 🚫 delete more unused code
- faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
- Interface improvements
- ✅ init-image at command line
- ✅ prompt expansion
- ✅ interactive cli
- Image Generation Features
- ✅ add k-diffusion sampling methods
- ✅ tiling
- ✅ generation videos/gifs
- ✅ controlnet
- scribbles input
- segmentation input
- mlsd input
- [Attend and Excite](https://attendandexcite.github.io/Attend-and-Excite/)
- Compositional Visual Generation
- https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
- https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
- ✅ negative prompting
- some syntax to allow it in a text string
- [paint with words](https://www.reddit.com/r/StableDiffusion/comments/10lzgze/i_figured_out_a_way_to_apply_different_prompts_to/)
- https://github.com/cloneofsimo/paint-with-words-sd
- https://multidiffusion.github.io/
- images as actual prompts instead of just init images.
- not directly possible due to model architecture.
- can it just be integrated into sampler?
- requires model fine-tuning since SD1.4 expects 77x768 text encoding input
- https://twitter.com/Buntworthy/status/1566744186153484288
- https://github.com/justinpinkney/stable-diffusion
- https://github.com/LambdaLabsML/lambda-diffusers
- https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
- Image Editing
- ✅outpainting
- https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
- lama cleaner
- ✅ inpainting
- https://github.com/Jack000/glid-3-xl-stable
- https://github.com/andreas128/RePaint
- ✅ img2img but keeps img stable
- https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
- https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1
- https://github.com/pesser/stable-diffusion/commit/bbb52981460707963e2a62160890d7ecbce00e79
- https://github.com/SHI-Labs/FcF-Inpainting https://praeclarumjj3.github.io/fcf-inpainting/
- ✅ text based image masking
- ✅ ClipSeg - https://github.com/timojl/clipseg
- https://github.com/facebookresearch/detectron2
- https://x-decoder-vl.github.io/
- Maskless editing
- ✅ instruct-pix2pix
-
- Attention Control Methods
- https://github.com/bloc97/CrossAttentionControl
- https://github.com/ChenWu98/cycle-diffusion
- Image Enhancement
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
- ✅ realesrgan
- ldm
- https://github.com/lowfuel/progrock-stable
- [txt2imghd](https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py)
- latent scaling + reprocessing
- stability upscaler
- rivers have wings upscaler
- stable super-res?
- todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
- todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
- https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
- ✅ face enhancers
- ✅ gfpgan - https://github.com/TencentARC/GFPGAN
- ✅ codeformer - https://github.com/sczhou/CodeFormer
- ✅ image describe feature -
- ✅ https://github.com/salesforce/BLIP
- 🚫 CLIP brute-force prompt reconstruction
- The accuracy of this approach is too low for me to include it in imaginAIry
- https://github.com/rmokady/CLIP_prefix_caption
- https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
- https://github.com/KaiyangZhou/CoOp
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's
worth the effort to support this feature
- ✅ img2img for plms
- ✅ img2img for kdiff functions
- Other
- Enhancement pipelines
- text-to-3d https://dreamfusionpaper.github.io/
- https://shihmengli.github.io/3D-Photo-Inpainting/
- https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
- Depth estimation
- what is SOTA for monocular depth estimation?
- https://github.com/compphoto/BoostingMonocularDepth
- make a video https://github.com/lucidrains/make-a-video-pytorch
- animations
- https://github.com/francislabountyjr/stable-diffusion/blob/main/inferencing_notebook.ipynb
- https://www.youtube.com/watch?v=E7aAFEhdngI
- https://github.com/pytti-tools/frame-interpolation
- guided generation
- https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1#scrollTo=UDeXQKbPTdZI
- https://colab.research.google.com/github/aicrumb/doohickey/blob/main/Doohickey_Diffusion.ipynb#scrollTo=PytCwKXCmPid
- https://github.com/mlfoundations/open_clip
- https://github.com/openai/guided-diffusion
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
- textual inversion
- https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
- https://github.com/Jack000/glid-3-xl-stable
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
- ✅ deploy to pypi
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
- Training
- Finetuning "dreambooth" style
- [Textual Inversion](https://arxiv.org/abs/2208.01618)
- [Fast Textual Inversion](https://github.com/peterwilli/sd-leap-booster)
- [Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)](https://github.com/cloneofsimo/lora)
- https://huggingface.co/spaces/lora-library/Low-rank-Adaptation
- Performance Improvements
- [ColoassalAI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
- Xformers
- Deepspeed
-
## Notable Stable Diffusion Implementations
- https://github.com/ahrm/UnstableFusion
- https://github.com/AUTOMATIC1111/stable-diffusion-webui
- https://github.com/blueturtleai/gimp-stable-diffusion
- https://github.com/hafriedlander/stable-diffusion-grpcserver
- https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion
- https://github.com/lkwq007/stablediffusion-infinity
- https://github.com/lstein/stable-diffusion
- https://github.com/parlance-zz/g-diffuser-lib
- https://github.com/hafriedlander/idea2art
## Online Stable Diffusion Services
- https://stablecog.com/
## Image Generation Algorithms
- 2023-02-22 - Composer (alibaba) [site](https://damo-vilab.github.io/composer-page/) [paper](https://arxiv.org/pdf/2302.09778.pdf) [code](https://github.com/damo-vilab/composer)
- Dalle (openai)
- Latent Diffusion
## Further Reading
- https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
- [Prompt Engineering Handbook](https://openart.ai/promptbook)
- Differences between samplers
- https://www.reddit.com/r/StableDiffusion/comments/xbeyw3/can_anyone_offer_a_little_guidance_on_the/
- https://www.reddit.com/r/bigsleep/comments/xb5cat/wiskkeys_lists_of_texttoimage_systems_and_related/
- https://huggingface.co/blog/annotated-diffusion
- https://github.com/jessevig/bertviz
- https://www.youtube.com/watch?v=5pIQFQZsNe8
- https://jalammar.github.io/illustrated-transformer/
- https://huggingface.co/blog/assets/78_annotated-diffusion/unet_architecture.jpg

@ -0,0 +1,34 @@
## Notable Stable Diffusion Implementations
- https://github.com/ahrm/UnstableFusion
- https://github.com/AUTOMATIC1111/stable-diffusion-webui
- https://github.com/blueturtleai/gimp-stable-diffusion
- https://github.com/hafriedlander/stable-diffusion-grpcserver
- https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion
- https://github.com/lkwq007/stablediffusion-infinity
- https://github.com/lstein/stable-diffusion
- https://github.com/parlance-zz/g-diffuser-lib
- https://github.com/hafriedlander/idea2art
## Image Generation Algorithms
| date | name | group | use type | FID | params |
|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|---------|----------|-----|--------|
| 2023-02-22 | [Composer](https://damo-vilab.github.io/composer-page/) [[paper]](https://arxiv.org/pdf/2302.09778.pdf) [[code]](https://github.com/damo-vilab/composer) | Alibaba | Private | 9.2 | 3.4B |
| 2022-11-24 | Stable Diffusion 2 | StabilityAI | Opensource
| 2022-09-28 | Dall-E-2 | OpenAI | API |
| 2022-08-22 | Stable Diffusion | StabilityAI | Opensource
| - | Latent Diffusion |
## Further Reading
- https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
- [Prompt Engineering Handbook](https://openart.ai/promptbook)
- Differences between samplers
- https://www.reddit.com/r/StableDiffusion/comments/xbeyw3/can_anyone_offer_a_little_guidance_on_the/
- https://www.reddit.com/r/bigsleep/comments/xb5cat/wiskkeys_lists_of_texttoimage_systems_and_related/
- https://huggingface.co/blog/annotated-diffusion
- https://github.com/jessevig/bertviz
- https://www.youtube.com/watch?v=5pIQFQZsNe8
- https://jalammar.github.io/illustrated-transformer/
- https://huggingface.co/blog/assets/78_annotated-diffusion/unet_architecture.jpg

@ -0,0 +1,141 @@
## Todo
- Inference Performance Optimizations
- ✅ fp16
- ✅ [Doggettx Sliced attention](https://github.com/CompVis/stable-diffusion/compare/main...Doggettx:stable-diffusion:autocast-improvements#)
- ✅ xformers support https://www.photoroom.com/tech/stable-diffusion-100-percent-faster-with-memory-efficient-attention/
- https://github.com/neonsecret/stable-diffusion
- https://github.com/CompVis/stable-diffusion/pull/177
- https://github.com/huggingface/diffusers/pull/532/files
- https://github.com/HazyResearch/flash-attention
- https://github.com/chavinlo/sda-node
- https://github.com/AminRezaei0x443/memory-efficient-attention/issues/7
- Development Environment
- ✅ add tests
- ✅ set up ci (test/lint/format)
- ✅ unified pipeline (txt2img & img2img combined)
- ✅ setup parallel testing
- add docs
- 🚫 remove yaml config
- 🚫 delete more unused code
- faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
- Interface improvements
- ✅ init-image at command line
- ✅ prompt expansion
- ✅ interactive cli
- Image Generation Features
- ✅ add k-diffusion sampling methods
- ✅ tiling
- ✅ generation videos/gifs
- ✅ controlnet
- scribbles input
- segmentation input
- mlsd input
- [Attend and Excite](https://attendandexcite.github.io/Attend-and-Excite/)
- Compositional Visual Generation
- https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
- https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
- ✅ negative prompting
- some syntax to allow it in a text string
- [paint with words](https://www.reddit.com/r/StableDiffusion/comments/10lzgze/i_figured_out_a_way_to_apply_different_prompts_to/)
- https://github.com/cloneofsimo/paint-with-words-sd
- https://multidiffusion.github.io/
- images as actual prompts instead of just init images.
- not directly possible due to model architecture.
- can it just be integrated into sampler?
- requires model fine-tuning since SD1.4 expects 77x768 text encoding input
- https://twitter.com/Buntworthy/status/1566744186153484288
- https://github.com/justinpinkney/stable-diffusion
- https://github.com/LambdaLabsML/lambda-diffusers
- https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
- Image Editing
- ✅outpainting
- https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
- lama cleaner
- ✅ inpainting
- https://github.com/Jack000/glid-3-xl-stable
- https://github.com/andreas128/RePaint
- ✅ img2img but keeps img stable
- https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
- https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1
- https://github.com/pesser/stable-diffusion/commit/bbb52981460707963e2a62160890d7ecbce00e79
- https://github.com/SHI-Labs/FcF-Inpainting https://praeclarumjj3.github.io/fcf-inpainting/
- ✅ text based image masking
- ✅ ClipSeg - https://github.com/timojl/clipseg
- https://github.com/facebookresearch/detectron2
- https://x-decoder-vl.github.io/
- Maskless editing
- ✅ instruct-pix2pix
-
- Attention Control Methods
- https://github.com/bloc97/CrossAttentionControl
- https://github.com/ChenWu98/cycle-diffusion
- Image Enhancement
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
- ✅ realesrgan
- ldm
- https://github.com/lowfuel/progrock-stable
- [txt2imghd](https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py)
- latent scaling + reprocessing
- stability upscaler
- rivers have wings upscaler
- stable super-res?
- todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
- todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
- https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
- ✅ face enhancers
- ✅ gfpgan - https://github.com/TencentARC/GFPGAN
- ✅ codeformer - https://github.com/sczhou/CodeFormer
- ✅ image describe feature -
- ✅ https://github.com/salesforce/BLIP
- 🚫 CLIP brute-force prompt reconstruction
- The accuracy of this approach is too low for me to include it in imaginAIry
- https://github.com/rmokady/CLIP_prefix_caption
- https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
- https://github.com/KaiyangZhou/CoOp
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's
worth the effort to support this feature
- ✅ img2img for plms
- ✅ img2img for kdiff functions
- Other
- Enhancement pipelines
- text-to-3d https://dreamfusionpaper.github.io/
- https://shihmengli.github.io/3D-Photo-Inpainting/
- https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
- Depth estimation
- what is SOTA for monocular depth estimation?
- https://github.com/compphoto/BoostingMonocularDepth
- make a video https://github.com/lucidrains/make-a-video-pytorch
- animations
- https://github.com/francislabountyjr/stable-diffusion/blob/main/inferencing_notebook.ipynb
- https://www.youtube.com/watch?v=E7aAFEhdngI
- https://github.com/pytti-tools/frame-interpolation
- guided generation
- https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1#scrollTo=UDeXQKbPTdZI
- https://colab.research.google.com/github/aicrumb/doohickey/blob/main/Doohickey_Diffusion.ipynb#scrollTo=PytCwKXCmPid
- https://github.com/mlfoundations/open_clip
- https://github.com/openai/guided-diffusion
- image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
- textual inversion
- https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
- https://github.com/Jack000/glid-3-xl-stable
- fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
- https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
- ✅ deploy to pypi
- find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
- https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
- Training
- Finetuning "dreambooth" style
- [Textual Inversion](https://arxiv.org/abs/2208.01618)
- [Fast Textual Inversion](https://github.com/peterwilli/sd-leap-booster)
- [Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)](https://github.com/cloneofsimo/lora)
- https://huggingface.co/spaces/lora-library/Low-rank-Adaptation
- Performance Improvements
- [ColoassalAI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
- Xformers
- Deepspeed
-

@ -53,6 +53,23 @@ def make_bounce_animation(
make_animation(imgs=frames, outpath=outpath, frame_duration_ms=durations)
def make_slideshow_animation(
imgs,
outpath,
image_pause_ms=1000,
):
# convert from latents
converted_frames = []
for frame in imgs:
if isinstance(frame, torch.Tensor):
frame = model_latents_to_pillow_imgs(frame)[0]
converted_frames.append(frame)
durations = [image_pause_ms] * len(converted_frames)
make_animation(imgs=converted_frames, outpath=outpath, frame_duration_ms=durations)
def make_animation(imgs, outpath, frame_duration_ms=100, captions=None):
imgs = imgpaths_to_imgs(imgs)
ext = os.path.splitext(outpath)[1].lower().strip(".")

@ -571,7 +571,8 @@ def _generate_single_image(
)
if prompt.caption_text:
add_caption_to_image(gen_img, prompt.caption_text)
caption_text = prompt.caption_text.format(prompt=prompt.prompt_text)
add_caption_to_image(gen_img, caption_text)
result = ImagineResult(
img=gen_img,

@ -86,6 +86,11 @@ def _imagine_cmd(
init_images = [init_image]
else:
init_images = init_image
from imaginairy.utils import glob_expand_paths
init_images = glob_expand_paths(init_images)
total_image_count = len(prompt_texts) * max(len(init_images), 1) * repeats
logger.info(
f"Received {len(prompt_texts)} prompt(s) and {len(init_images)} input image(s). Will repeat the generations {repeats} times to create {total_image_count} images."
@ -197,13 +202,12 @@ def _imagine_cmd(
comp_imgs = [LazyLoadingImage(filepath=f) for f in filenames]
comp_imgs.reverse()
from imaginairy.animations import make_bounce_animation
from imaginairy.animations import make_slideshow_animation
make_bounce_animation(
make_slideshow_animation(
outpath=new_filename,
imgs=comp_imgs,
start_pause_duration_ms=1500,
end_pause_duration_ms=1000,
image_pause_ms=1000,
)
logger.info(f"[compilation] saved to: {new_filename}")

@ -32,9 +32,10 @@ def upscale_cmd(image_filepaths, outdir, fix_faces, fix_faces_fidelity):
from imaginairy import LazyLoadingImage
from imaginairy.enhancers.face_restoration_codeformer import enhance_faces
from imaginairy.enhancers.upscale_realesrgan import upscale_image
from imaginairy.utils import glob_expand_paths
os.makedirs(outdir, exist_ok=True)
image_filepaths = glob_expand_paths(image_filepaths)
for p in tqdm(image_filepaths):
savepath = os.path.join(outdir, os.path.basename(p))
if p.startswith("http"):

File diff suppressed because it is too large Load Diff

@ -0,0 +1,24 @@
blueberries
fruit salad
gold coins
leaves
autumn leaves
wood grain
tree bark
sunflowers
strawberries
abstract art
ferns
jellybeans
floral pattern
the galaxy
pebbles
cobblestone
stacks of books
paint splatters
a forest
blue sky
glitter
diamonds
grass

@ -0,0 +1,44 @@
18th century european
traditional european
maximimalist
minimalist
modern
traditional
contemporary
eclectic
new traditional
transitional
midcentury modern
art deco
scandinavian
farmhouse
modern farmhouse
bohemian
industrial
scandifornian
french country
shabby chic
desert modern
english countryside chic
modern coastal
new mediterranean
naturalist
southwestern
coastal
rustic
modern rustic
Art Nouveau
gothic
victorian
indian
moroccan
japanese
spanish
maverick
hollywood regency
urban modern
asian zen
1950s
futuristic
Neoclassical ornate golden palace
messy and outdated

@ -0,0 +1,13 @@
abstract painting
acrylic painting
cave painting
fresco painting
impressionist painting
mural painting
oil painting
post-impressionism painting
realist painting
still life painting
surrealist painting
tempera painting
watercolor painting

@ -200,3 +200,12 @@ def shrink_list(items, max_size):
for i, item in enumerate(items):
new_items[int(i / removal_ratio)] = item
return [items[0]] + list(new_items.values())
def glob_expand_paths(paths):
import glob
expanded_paths = []
for p in paths:
expanded_paths.extend(glob.glob(p))
return expanded_paths

Loading…
Cancel
Save