feature: add art-scene, desktop-background, interior-style, painting-style phraselists

fix: file globbing works in the interactive shell feature: make compilation animations simple slide shows
1 year ago · eb26d5a7c5
parent de2a72f718
commit eb26d5a7c5
12 changed files with 1743 additions and 197 deletions
--- a/README.md
+++ b/README.md
@ -31,65 +31,60 @@ AI imagined images. Pythonic generation of stable diffusion images.
 ### Image Structure Control [by ControlNet](https://github.com/lllyasviel/ControlNet)
 Generate images guided by body poses, depth maps, canny edges, hed boundaries, or normal maps.

-<details>
-    <summary>Openpose Control</summary>
+**Openpose Control**

 ```bash
 imagine --control-image assets/indiana.jpg  --control-mode openpose --caption-text openpose "photo of a polar bear"
 ```
-</details>
+
 <p float="left">
    <img src="assets/indiana.jpg" height="256">
    <img src="assets/indiana-pose.jpg" height="256">
    <img src="assets/indiana-pose-polar-bear.jpg" height="256">
 </p>

-<details>
-    <summary>Canny Edge Control</summary>
+**Canny Edge Control**

 ```bash
-imagine --control-image assets/lena.png  --control-mode canny --caption-text canny "photo of a woman with a hat looking at the camera"
+imagine --control-image assets/lena.png  --control-mode canny "photo of a woman with a hat looking at the camera"
 ```
-</details>
+
 <p float="left">
    <img src="assets/lena.png" height="256">
    <img src="assets/lena-canny.jpg" height="256">
    <img src="assets/lena-canny-generated.jpg" height="256">
 </p>

-<details>
-    <summary>HED Boundary Control</summary>
+**HED Boundary Control**

 ```bash
 imagine --control-image dog.jpg  --control-mode hed  "photo of a dalmation"
 ```
-</details>
+
 <p float="left">
    <img src="assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg" height="256">
    <img src="assets/dog-hed-boundary.jpg" height="256">
    <img src="assets/dog-hed-boundary-dalmation.jpg" height="256">
 </p>

-<details>
-    <summary>Depth Map Control</summary>
+**Depth Map Control**

 ```bash
 imagine --control-image fancy-living.jpg  --control-mode depth  "a modern living room"
 ```
-</details>
+
 <p float="left">
    <img src="assets/fancy-living.jpg" height="256">
    <img src="assets/fancy-living-depth.jpg" height="256">
    <img src="assets/fancy-living-depth-generated.jpg" height="256">
 </p>

-<details>
-    <summary>Normal Map Control</summary>
+**Normal Map Control**

 ```bash
 imagine --control-image bird.jpg  --control-mode normal  "a bird"
 ```
-</details>
+
 <p float="left">
    <img src="assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg" height="256">
    <img src="assets/bird-normal.jpg" height="256">
@ -208,6 +203,8 @@ When writing strength modifiers keep in mind that pixel values are between 0 and
 ### Upscaling [by RealESRGAN](https://github.com/xinntao/Real-ESRGAN)
 ```bash
 >> imagine "colorful smoke" --steps 40 --upscale
+# upscale an existing image
+>> aimg upscale my-image.jpg
 ```
 <img src="https://github.com/brycedrennan/imaginAIry/raw/master/assets/000206_856637805_PLMS40_PS7.5_colorful_smoke.jpg" height="128"> ➡️ 
 <img src="https://github.com/brycedrennan/imaginAIry/raw/master/assets/000206_856637805_PLMS40_PS7.5_colorful_smoke_upscaled.jpg" height="256"> 
@ -276,12 +273,12 @@ You can use `{}` to randomly pull values from lists.  A list of values separated
 pull from a phrase list of the same name.   Folders containing .txt phraselist files may be specified via
 `--prompt_library_path`. The option may be specified multiple times.  Built-in categories:
    
-      3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-movement, 
+      3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-scene, art-movement, 
      art-site, artist, artist-botanical, artist-surreal, aspect-ratio, bird, body-of-water, body-pose, camera-brand,
-      camera-model, color, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, dinosaur, eyecolor, f-stop, 
+      camera-model, color, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, desktop-background, dinosaur, eyecolor, f-stop, 
      fantasy-creature, fantasy-setting, fish, flower, focal-length, food, fruit, games, gen-modifier, hair, hd,
      iso-stop, landscape-type, national-park, nationality, neg-weight, noun-beauty, noun-fantasy, noun-general, 
-      noun-horror, occupation, photo-term, pop-culture, pop-location, punk-style, quantity, rpg-item, scenario-desc, 
+      noun-horror, occupation, painting-style, photo-term, pop-culture, pop-location, punk-style, quantity, rpg-item, scenario-desc, 
      skin-color, spaceship, style, tree-species, trippy, world-heritage-site

   Examples:
@ -305,6 +302,26 @@ You can use `{}` to randomly pull values from lists.  A list of values separated
 a bowl full of gold bars sitting on a table
 ```

+### Example Use Cases
+
+```bash
+>> aimg
+# Generate endless 8k art
+🤖🧠> imagine -w 1920 -h 1080 --upscale "{_art-scene_}. {_painting-style_} by {_artist_}" -r 1000 --steps 30 --model sd21v
+
+# generate endless desktop backgrounds 
+🤖🧠> imagine --tile "{_desktop-background_}" -r 100
+
+# convert a folder of images to pencil sketches
+🤖🧠> edit other/images/*.jpg -p "make it a pencil sketch"
+
+# upscale a folder of images
+🤖🧠> upscale my-images/*.jpg
+
+# generate kitchen remodel ideas
+🤖🧠> imagine --control-image kitchen.jpg -w 1024 -h 1024 "{_interior-style_} kitchen" --control-mode depth -r 100 --init-image 0.01 --upscale --steps 35 --caption-text "{prompt}"
+```
+
 ### Additional Features
 - Generate images either in code or from command line.
 - It just works. Proper requirements are installed. Model weights are automatically downloaded. No huggingface account needed. 
@ -381,6 +398,12 @@ docker run -it --gpus all -v $HOME/.cache/huggingface:/root/.cache/huggingface -


 ## ChangeLog
+
+- docs: add some example use cases
+- feature: add art-scene, desktop-background, interior-style, painting-style phraselists
+- fix: compilation animations create normal slideshows instead of "bounces"
+- fix: file globbing works in the interactive shell
+
 **11.0.0**
 - all these changes together mean same seed/sampler will not be guaranteed to produce same image (thus the version bump)
 - fix: image composition didn't work very well. Works well now but probably very slow on non-cuda platforms
@ -655,175 +678,4 @@ would be uncorrelated to the rest of the surrounding image.  It created terrible
 - a GUI. this is a python library
 - exploratory features that don't work well

-## Todo
-
- - Inference Performance Optimizations
-   - ✅ fp16
-   - ✅ [Doggettx Sliced attention](https://github.com/CompVis/stable-diffusion/compare/main...Doggettx:stable-diffusion:autocast-improvements#)
-   - ✅ xformers support https://www.photoroom.com/tech/stable-diffusion-100-percent-faster-with-memory-efficient-attention/
-   - https://github.com/neonsecret/stable-diffusion  
-   - https://github.com/CompVis/stable-diffusion/pull/177
-   - https://github.com/huggingface/diffusers/pull/532/files
-   - https://github.com/HazyResearch/flash-attention
-   - https://github.com/chavinlo/sda-node
-   - https://github.com/AminRezaei0x443/memory-efficient-attention/issues/7
-   
- - Development Environment
-   - ✅ add tests
-   - ✅ set up ci (test/lint/format)
-   - ✅ unified pipeline (txt2img & img2img combined)
-   - ✅ setup parallel testing
-   - add docs
-   - 🚫 remove yaml config
-   - 🚫 delete more unused code
-   - faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
- - Interface improvements
-   - ✅ init-image at command line
-   - ✅ prompt expansion
-   - ✅ interactive cli
- - Image Generation Features
-   - ✅ add k-diffusion sampling methods
-   - ✅ tiling
-   - ✅ generation videos/gifs
-   - ✅ controlnet
-     - scribbles input
-     - segmentation input
-     - mlsd input
-   - [Attend and Excite](https://attendandexcite.github.io/Attend-and-Excite/)
-   - Compositional Visual Generation
-     - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
-     - https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
-   - ✅ negative prompting
-     - some syntax to allow it in a text string
-   - [paint with words](https://www.reddit.com/r/StableDiffusion/comments/10lzgze/i_figured_out_a_way_to_apply_different_prompts_to/)
-     - https://github.com/cloneofsimo/paint-with-words-sd 
-   - https://multidiffusion.github.io/
-   - images as actual prompts instead of just init images. 
-     - not directly possible due to model architecture.
-     - can it just be integrated into sampler? 
-     - requires model fine-tuning since SD1.4 expects 77x768 text encoding input
-     - https://twitter.com/Buntworthy/status/1566744186153484288
-     - https://github.com/justinpinkney/stable-diffusion
-     - https://github.com/LambdaLabsML/lambda-diffusers
-     - https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
- - Image Editing
-   - ✅outpainting
-     - https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
-     - lama cleaner
-   - ✅ inpainting
-     - https://github.com/Jack000/glid-3-xl-stable 
-     - https://github.com/andreas128/RePaint
-     - ✅ img2img but keeps img stable
-     - https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
-     - https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1
-     - https://github.com/pesser/stable-diffusion/commit/bbb52981460707963e2a62160890d7ecbce00e79
-     - https://github.com/SHI-Labs/FcF-Inpainting https://praeclarumjj3.github.io/fcf-inpainting/
-   - ✅ text based image masking
-     - ✅ ClipSeg - https://github.com/timojl/clipseg
-     - https://github.com/facebookresearch/detectron2
-     - https://x-decoder-vl.github.io/
-   - Maskless editing
-     - ✅ instruct-pix2pix
-     - 
-   - Attention Control Methods
-     - https://github.com/bloc97/CrossAttentionControl
-     - https://github.com/ChenWu98/cycle-diffusion
- - Image Enhancement
-   - Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
-   - Upscaling
-     - ✅ realesrgan 
-     - ldm
-     - https://github.com/lowfuel/progrock-stable
-     - [txt2imghd](https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py)
-     - latent scaling + reprocessing
-     - stability upscaler
-     - rivers have wings upscaler
-     - stable super-res?
-       - todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
-       - todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
-       - https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
-   - ✅ face enhancers
-     - ✅ gfpgan - https://github.com/TencentARC/GFPGAN
-     - ✅ codeformer - https://github.com/sczhou/CodeFormer
-   - ✅ image describe feature - 
-     - ✅ https://github.com/salesforce/BLIP
-     - 🚫 CLIP brute-force prompt reconstruction
-       - The accuracy of this approach is too low for me to include it in imaginAIry
-       - https://github.com/rmokady/CLIP_prefix_caption
-       - https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
-     - https://github.com/KaiyangZhou/CoOp
-   - 🚫 CPU support.  While the code does actually work on some CPUs, the generation takes so long that I don't think it's
-    worth the effort to support this feature
-   - ✅ img2img for plms
-   - ✅ img2img for kdiff functions
- - Other
-   - Enhancement pipelines
-   - text-to-3d https://dreamfusionpaper.github.io/
-     - https://shihmengli.github.io/3D-Photo-Inpainting/
-     - https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
-     - Depth estimation
-       - what is SOTA for monocular depth estimation?
-       - https://github.com/compphoto/BoostingMonocularDepth
-   - make a video https://github.com/lucidrains/make-a-video-pytorch
-   - animations
-     - https://github.com/francislabountyjr/stable-diffusion/blob/main/inferencing_notebook.ipynb
-     - https://www.youtube.com/watch?v=E7aAFEhdngI
-     - https://github.com/pytti-tools/frame-interpolation
-   - guided generation 
-     - https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1#scrollTo=UDeXQKbPTdZI
-     - https://colab.research.google.com/github/aicrumb/doohickey/blob/main/Doohickey_Diffusion.ipynb#scrollTo=PytCwKXCmPid
-     - https://github.com/mlfoundations/open_clip
-     - https://github.com/openai/guided-diffusion
-   - image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
-   - textual inversion 
-     - https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/
-     - https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h
-     - https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
-     - https://github.com/Jack000/glid-3-xl-stable
-   - fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
-   - https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
-   - ✅ deploy to pypi
-   - find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
-   - https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
- - Training
-   - Finetuning "dreambooth" style
-   - [Textual Inversion](https://arxiv.org/abs/2208.01618)
-     - [Fast Textual Inversion](https://github.com/peterwilli/sd-leap-booster) 
-   - [Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)](https://github.com/cloneofsimo/lora)
-     - https://huggingface.co/spaces/lora-library/Low-rank-Adaptation 
-   - Performance Improvements
-    - [ColoassalAI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
-    - Xformers
-    - Deepspeed
-    - 
-
-## Notable Stable Diffusion Implementations
- - https://github.com/ahrm/UnstableFusion
- - https://github.com/AUTOMATIC1111/stable-diffusion-webui
- - https://github.com/blueturtleai/gimp-stable-diffusion
- - https://github.com/hafriedlander/stable-diffusion-grpcserver
- - https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion
- - https://github.com/lkwq007/stablediffusion-infinity
- - https://github.com/lstein/stable-diffusion
- - https://github.com/parlance-zz/g-diffuser-lib
- - https://github.com/hafriedlander/idea2art
-
-## Online Stable Diffusion Services
- - https://stablecog.com/ 
-
-## Image Generation Algorithms
- - 2023-02-22 - Composer (alibaba) [site](https://damo-vilab.github.io/composer-page/) [paper](https://arxiv.org/pdf/2302.09778.pdf) [code](https://github.com/damo-vilab/composer)
- - Dalle (openai)
- - Latent Diffusion
-
-## Further Reading
- - https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
- - [Prompt Engineering Handbook](https://openart.ai/promptbook)
- - Differences between samplers
-   - https://www.reddit.com/r/StableDiffusion/comments/xbeyw3/can_anyone_offer_a_little_guidance_on_the/
- - https://www.reddit.com/r/bigsleep/comments/xb5cat/wiskkeys_lists_of_texttoimage_systems_and_related/
- - https://huggingface.co/blog/annotated-diffusion
- - https://github.com/jessevig/bertviz
- - https://www.youtube.com/watch?v=5pIQFQZsNe8
- - https://jalammar.github.io/illustrated-transformer/
- - https://huggingface.co/blog/assets/78_annotated-diffusion/unet_architecture.jpg
+
--- a/docs/ecosystem-notes.md
+++ b/docs/ecosystem-notes.md
@ -0,0 +1,34 @@
+
+## Notable Stable Diffusion Implementations
+ - https://github.com/ahrm/UnstableFusion
+ - https://github.com/AUTOMATIC1111/stable-diffusion-webui
+ - https://github.com/blueturtleai/gimp-stable-diffusion
+ - https://github.com/hafriedlander/stable-diffusion-grpcserver
+ - https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion
+ - https://github.com/lkwq007/stablediffusion-infinity
+ - https://github.com/lstein/stable-diffusion
+ - https://github.com/parlance-zz/g-diffuser-lib
+ - https://github.com/hafriedlander/idea2art
+
+## Image Generation Algorithms
+
+| date       | name                                                                                                                                                     | group   | use type | FID | params |
+|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|---------|----------|-----|--------|
+| 2023-02-22 | [Composer](https://damo-vilab.github.io/composer-page/) [[paper]](https://arxiv.org/pdf/2302.09778.pdf) [[code]](https://github.com/damo-vilab/composer) | Alibaba | Private  | 9.2 | 3.4B   |
+| 2022-11-24 | Stable Diffusion 2                                                                                                                                       | StabilityAI | Opensource
+| 2022-09-28 | Dall-E-2                                                                                                                                                 | OpenAI | API | 
+| 2022-08-22 | Stable Diffusion                                                                                                                                         | StabilityAI | Opensource
+| - | Latent Diffusion                                                                                                                                         |
+ 
+
+## Further Reading
+ - https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
+ - [Prompt Engineering Handbook](https://openart.ai/promptbook)
+ - Differences between samplers
+   - https://www.reddit.com/r/StableDiffusion/comments/xbeyw3/can_anyone_offer_a_little_guidance_on_the/
+ - https://www.reddit.com/r/bigsleep/comments/xb5cat/wiskkeys_lists_of_texttoimage_systems_and_related/
+ - https://huggingface.co/blog/annotated-diffusion
+ - https://github.com/jessevig/bertviz
+ - https://www.youtube.com/watch?v=5pIQFQZsNe8
+ - https://jalammar.github.io/illustrated-transformer/
+ - https://huggingface.co/blog/assets/78_annotated-diffusion/unet_architecture.jpg
--- a/docs/todo.md
+++ b/docs/todo.md
@ -0,0 +1,141 @@
+## Todo
+
+ - Inference Performance Optimizations
+   - ✅ fp16
+   - ✅ [Doggettx Sliced attention](https://github.com/CompVis/stable-diffusion/compare/main...Doggettx:stable-diffusion:autocast-improvements#)
+   - ✅ xformers support https://www.photoroom.com/tech/stable-diffusion-100-percent-faster-with-memory-efficient-attention/
+   - https://github.com/neonsecret/stable-diffusion  
+   - https://github.com/CompVis/stable-diffusion/pull/177
+   - https://github.com/huggingface/diffusers/pull/532/files
+   - https://github.com/HazyResearch/flash-attention
+   - https://github.com/chavinlo/sda-node
+   - https://github.com/AminRezaei0x443/memory-efficient-attention/issues/7
+   
+ - Development Environment
+   - ✅ add tests
+   - ✅ set up ci (test/lint/format)
+   - ✅ unified pipeline (txt2img & img2img combined)
+   - ✅ setup parallel testing
+   - add docs
+   - 🚫 remove yaml config
+   - 🚫 delete more unused code
+   - faster latent logging https://discuss.huggingface.co/t/decoding-latents-to-rgb-without-upscaling/23204/9
+ - Interface improvements
+   - ✅ init-image at command line
+   - ✅ prompt expansion
+   - ✅ interactive cli
+ - Image Generation Features
+   - ✅ add k-diffusion sampling methods
+   - ✅ tiling
+   - ✅ generation videos/gifs
+   - ✅ controlnet
+     - scribbles input
+     - segmentation input
+     - mlsd input
+   - [Attend and Excite](https://attendandexcite.github.io/Attend-and-Excite/)
+   - Compositional Visual Generation
+     - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
+     - https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
+   - ✅ negative prompting
+     - some syntax to allow it in a text string
+   - [paint with words](https://www.reddit.com/r/StableDiffusion/comments/10lzgze/i_figured_out_a_way_to_apply_different_prompts_to/)
+     - https://github.com/cloneofsimo/paint-with-words-sd 
+   - https://multidiffusion.github.io/
+   - images as actual prompts instead of just init images. 
+     - not directly possible due to model architecture.
+     - can it just be integrated into sampler? 
+     - requires model fine-tuning since SD1.4 expects 77x768 text encoding input
+     - https://twitter.com/Buntworthy/status/1566744186153484288
+     - https://github.com/justinpinkney/stable-diffusion
+     - https://github.com/LambdaLabsML/lambda-diffusers
+     - https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
+ - Image Editing
+   - ✅outpainting
+     - https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
+     - lama cleaner
+   - ✅ inpainting
+     - https://github.com/Jack000/glid-3-xl-stable 
+     - https://github.com/andreas128/RePaint
+     - ✅ img2img but keeps img stable
+     - https://www.reddit.com/r/StableDiffusion/comments/xboy90/a_better_way_of_doing_img2img_by_finding_the/
+     - https://gist.github.com/trygvebw/c71334dd127d537a15e9d59790f7f5e1
+     - https://github.com/pesser/stable-diffusion/commit/bbb52981460707963e2a62160890d7ecbce00e79
+     - https://github.com/SHI-Labs/FcF-Inpainting https://praeclarumjj3.github.io/fcf-inpainting/
+   - ✅ text based image masking
+     - ✅ ClipSeg - https://github.com/timojl/clipseg
+     - https://github.com/facebookresearch/detectron2
+     - https://x-decoder-vl.github.io/
+   - Maskless editing
+     - ✅ instruct-pix2pix
+     - 
+   - Attention Control Methods
+     - https://github.com/bloc97/CrossAttentionControl
+     - https://github.com/ChenWu98/cycle-diffusion
+ - Image Enhancement
+   - Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
+   - Upscaling
+     - ✅ realesrgan 
+     - ldm
+     - https://github.com/lowfuel/progrock-stable
+     - [txt2imghd](https://github.com/jquesnelle/txt2imghd/blob/master/txt2imghd.py)
+     - latent scaling + reprocessing
+     - stability upscaler
+     - rivers have wings upscaler
+     - stable super-res?
+       - todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
+       - todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
+       - https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
+   - ✅ face enhancers
+     - ✅ gfpgan - https://github.com/TencentARC/GFPGAN
+     - ✅ codeformer - https://github.com/sczhou/CodeFormer
+   - ✅ image describe feature - 
+     - ✅ https://github.com/salesforce/BLIP
+     - 🚫 CLIP brute-force prompt reconstruction
+       - The accuracy of this approach is too low for me to include it in imaginAIry
+       - https://github.com/rmokady/CLIP_prefix_caption
+       - https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
+     - https://github.com/KaiyangZhou/CoOp
+   - 🚫 CPU support.  While the code does actually work on some CPUs, the generation takes so long that I don't think it's
+    worth the effort to support this feature
+   - ✅ img2img for plms
+   - ✅ img2img for kdiff functions
+ - Other
+   - Enhancement pipelines
+   - text-to-3d https://dreamfusionpaper.github.io/
+     - https://shihmengli.github.io/3D-Photo-Inpainting/
+     - https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/50
+     - Depth estimation
+       - what is SOTA for monocular depth estimation?
+       - https://github.com/compphoto/BoostingMonocularDepth
+   - make a video https://github.com/lucidrains/make-a-video-pytorch
+   - animations
+     - https://github.com/francislabountyjr/stable-diffusion/blob/main/inferencing_notebook.ipynb
+     - https://www.youtube.com/watch?v=E7aAFEhdngI
+     - https://github.com/pytti-tools/frame-interpolation
+   - guided generation 
+     - https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1#scrollTo=UDeXQKbPTdZI
+     - https://colab.research.google.com/github/aicrumb/doohickey/blob/main/Doohickey_Diffusion.ipynb#scrollTo=PytCwKXCmPid
+     - https://github.com/mlfoundations/open_clip
+     - https://github.com/openai/guided-diffusion
+   - image variations https://github.com/lstein/stable-diffusion/blob/main/VARIATIONS.md
+   - textual inversion 
+     - https://www.reddit.com/r/StableDiffusion/comments/xbwb5y/how_to_run_textual_inversion_locally_train_your/
+     - https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb#scrollTo=50JuJUM8EG1h
+     - https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
+     - https://github.com/Jack000/glid-3-xl-stable
+   - fix saturation at high CFG https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
+   - https://www.reddit.com/r/StableDiffusion/comments/xbrrgt/a_rundown_of_twenty_new_methodsoptions_added_to/
+   - ✅ deploy to pypi
+   - find similar images https://knn5.laion.ai/?back=https%3A%2F%2Fknn5.laion.ai%2F&index=laion5B&useMclip=false
+   - https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
+ - Training
+   - Finetuning "dreambooth" style
+   - [Textual Inversion](https://arxiv.org/abs/2208.01618)
+     - [Fast Textual Inversion](https://github.com/peterwilli/sd-leap-booster) 
+   - [Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning (LORA)](https://github.com/cloneofsimo/lora)
+     - https://huggingface.co/spaces/lora-library/Low-rank-Adaptation 
+   - Performance Improvements
+    - [ColoassalAI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) - almost got it working but it's not easy enough to install to merit inclusion in imaginairy. We should check back in on this.
+    - Xformers
+    - Deepspeed
+    - 
--- a/imaginairy/animations.py
+++ b/imaginairy/animations.py
@ -53,6 +53,23 @@ def make_bounce_animation(
    make_animation(imgs=frames, outpath=outpath, frame_duration_ms=durations)


+def make_slideshow_animation(
+    imgs,
+    outpath,
+    image_pause_ms=1000,
+):
+    # convert from latents
+    converted_frames = []
+    for frame in imgs:
+        if isinstance(frame, torch.Tensor):
+            frame = model_latents_to_pillow_imgs(frame)[0]
+        converted_frames.append(frame)
+
+    durations = [image_pause_ms] * len(converted_frames)
+
+    make_animation(imgs=converted_frames, outpath=outpath, frame_duration_ms=durations)
+
+
 def make_animation(imgs, outpath, frame_duration_ms=100, captions=None):
    imgs = imgpaths_to_imgs(imgs)
    ext = os.path.splitext(outpath)[1].lower().strip(".")
--- a/imaginairy/api.py
+++ b/imaginairy/api.py
@ -571,7 +571,8 @@ def _generate_single_image(
                )

            if prompt.caption_text:
-                add_caption_to_image(gen_img, prompt.caption_text)
+                caption_text = prompt.caption_text.format(prompt=prompt.prompt_text)
+                add_caption_to_image(gen_img, caption_text)

        result = ImagineResult(
            img=gen_img,
--- a/imaginairy/cli/shared.py
+++ b/imaginairy/cli/shared.py
@ -86,6 +86,11 @@ def _imagine_cmd(
        init_images = [init_image]
    else:
        init_images = init_image
+
+    from imaginairy.utils import glob_expand_paths
+
+    init_images = glob_expand_paths(init_images)
+
    total_image_count = len(prompt_texts) * max(len(init_images), 1) * repeats
    logger.info(
        f"Received {len(prompt_texts)} prompt(s) and {len(init_images)} input image(s). Will repeat the generations {repeats} times to create {total_image_count} images."
@ -197,13 +202,12 @@ def _imagine_cmd(
        comp_imgs = [LazyLoadingImage(filepath=f) for f in filenames]
        comp_imgs.reverse()

-        from imaginairy.animations import make_bounce_animation
+        from imaginairy.animations import make_slideshow_animation

-        make_bounce_animation(
+        make_slideshow_animation(
            outpath=new_filename,
            imgs=comp_imgs,
-            start_pause_duration_ms=1500,
-            end_pause_duration_ms=1000,
+            image_pause_ms=1000,
        )

        logger.info(f"[compilation] saved to: {new_filename}")
--- a/imaginairy/cli/upscale.py
+++ b/imaginairy/cli/upscale.py
@ -32,9 +32,10 @@ def upscale_cmd(image_filepaths, outdir, fix_faces, fix_faces_fidelity):
    from imaginairy import LazyLoadingImage
    from imaginairy.enhancers.face_restoration_codeformer import enhance_faces
    from imaginairy.enhancers.upscale_realesrgan import upscale_image
+    from imaginairy.utils import glob_expand_paths

    os.makedirs(outdir, exist_ok=True)
-
+    image_filepaths = glob_expand_paths(image_filepaths)
    for p in tqdm(image_filepaths):
        savepath = os.path.join(outdir, os.path.basename(p))
        if p.startswith("http"):
--- a/imaginairy/enhancers/phraselists/art-scene.txt
+++ b/imaginairy/enhancers/phraselists/art-scene.txt
--- a/imaginairy/enhancers/phraselists/desktop-background.txt
+++ b/imaginairy/enhancers/phraselists/desktop-background.txt
@ -0,0 +1,24 @@
+
+blueberries
+fruit salad
+gold coins
+leaves
+autumn leaves
+wood grain
+tree bark
+sunflowers
+strawberries
+abstract art
+ferns
+jellybeans
+floral pattern
+the galaxy
+pebbles
+cobblestone
+stacks of books
+paint splatters
+a forest
+blue sky
+glitter
+diamonds
+grass
--- a/imaginairy/enhancers/phraselists/interior-style.txt
+++ b/imaginairy/enhancers/phraselists/interior-style.txt
@ -0,0 +1,44 @@
+18th century european
+traditional european
+maximimalist
+minimalist
+modern
+traditional
+contemporary
+eclectic
+new traditional
+transitional
+midcentury modern
+art deco
+scandinavian
+farmhouse
+modern farmhouse
+bohemian
+industrial
+scandifornian
+french country
+shabby chic
+desert modern
+english countryside chic
+modern coastal
+new mediterranean
+naturalist
+southwestern
+coastal
+rustic
+modern rustic
+Art Nouveau
+gothic
+victorian
+indian
+moroccan
+japanese
+spanish
+maverick
+hollywood regency
+urban modern
+asian zen
+1950s
+futuristic
+Neoclassical ornate golden palace
+messy and outdated
--- a/imaginairy/enhancers/phraselists/painting-style.txt
+++ b/imaginairy/enhancers/phraselists/painting-style.txt
@ -0,0 +1,13 @@
+abstract painting
+acrylic painting
+cave painting
+fresco painting
+impressionist painting
+mural painting
+oil painting
+post-impressionism painting
+realist painting
+still life painting
+surrealist painting
+tempera painting
+watercolor painting
--- a/imaginairy/utils.py
+++ b/imaginairy/utils.py
@ -200,3 +200,12 @@ def shrink_list(items, max_size):
    for i, item in enumerate(items):
        new_items[int(i / removal_ratio)] = item
    return [items[0]] + list(new_items.values())
+
+
+def glob_expand_paths(paths):
+    import glob
+
+    expanded_paths = []
+    for p in paths:
+        expanded_paths.extend(glob.glob(p))
+    return expanded_paths