@ -9,78 +9,156 @@ AI imagined images. Pythonic generation of stable diffusion images.
"just works" on Linux and macOS(M1) (and maybe windows?).
## Examples
```bash
# on macOS, make sure rust is installed first
>> pip install imaginairy
>> imagine "a scenic landscape" "a photo of a dog" "photo of a fruit bowl" "portrait photo of a freckled woman"
# Stable Diffusion 2.1
>> imagine --model SD-2.1 "a forest"
# Make generation gif
>> imagine "a scenic landscape" "a photo of a dog" "photo of a fruit bowl" "portrait photo of a freckled woman" "a bluejay"
# Make an animation showing the generation process
>> imagine --gif "a flower"
```
< p float = "left" >
< img src = "assets/000019_786355545_PLMS50_PS7.5_a_scenic_landscape.jpg" height = "256" >
< img src = "assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg" height = "256" >
< img src = "assets/000056_293284644_PLMS40_PS7.5_photo_of_a_bowl_of_fruit.jpg" height = "256" >
< img src = "assets/000078_260972468_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg" height = "256" >
< img src = "assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg" height = "256" >
< img src = "assets/009719_942389026_kdpmpp2m15_PS7.5_a_flower.gif" height = "256" >
< / p >
## Features
- Instruction based image edits (InstructPix2Pix)
- Control image generation structure (ControlNet)
- Seamless tiled images
- Text-based masking (clipseg)
- Face enhancement (CodeFormer)
- Upscaling
- Outpainting
- Prompt expansion
- Image captioning
- Different generation models
### Image Structure Control [by ControlNet ](https://github.com/lllyasviel/ControlNet )
Generate images guided by body poses, depth maps, canny edges, hed boundaries, or normal maps.
< details >
< summary > Openpose Control< / summary >
```bash
imagine --control-image assets/indiana.jpg --control-mode openpose --caption-text openpose "photo of a polar bear"
```
< / details >
< p float = "left" >
< img src = "assets/indiana.jpg" height = "256" >
< img src = "assets/indiana-pose.jpg" height = "256" >
< img src = "assets/indiana-pose-polar-bear.jpg" height = "256" >
< / p >
< details closed >
< summary > Console Output< / summary >
< details >
< summary > Canny Edge Control < / summary >
```bash
🤖🧠 received 4 prompt(s) and will repeat them 1 times to create 4 images.
Loading model onto mps backend...
Generating 🖼 : "a scenic landscape" 512x512px seed:557988237 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29< 00:00 , 1 . 36it / s ]
🖼 saved to: ./outputs/000001_557988237_PLMS40_PS7.5_a_scenic_landscape.jpg
Generating 🖼 : "a photo of a dog" 512x512px seed:277230171 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28< 00:00 , 1 . 41it / s ]
🖼 saved to: ./outputs/000002_277230171_PLMS40_PS7.5_a_photo_of_a_dog.jpg
Generating 🖼 : "photo of a fruit bowl" 512x512px seed:639753980 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28< 00:00 , 1 . 40it / s ]
🖼 saved to: ./outputs/000003_639753980_PLMS40_PS7.5_photo_of_a_fruit_bowl.jpg
Generating 🖼 : "portrait photo of a freckled woman" 512x512px seed:500686645 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29< 00:00 , 1 . 37it / s ]
🖼 saved to: ./outputs/000004_500686645_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg
imagine --control-image assets/lena.png --control-mode canny --caption-text canny "photo of a woman with a hat looking at the camera"
```
< / details >
< p float = "left" >
< img src = "assets/lena.png" height = "256" >
< img src = "assets/lena-canny.jpg" height = "256" >
< img src = "assets/lena-canny-generated.jpg" height = "256" >
< / p >
< img src = "https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000019_786355545_PLMS50_PS7.5_a_scenic_landscape.jpg" height = "256" > < img src = "https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg" height = "256" > < br >
< img src = "https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000056_293284644_PLMS40_PS7.5_photo_of_a_bowl_of_fruit.jpg" height = "256" > < img src = "https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000078_260972468_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg" height = "256" > < br >
< img src = "assets/009719_942389026_kdpmpp2m15_PS7.5_a_flower.gif" height = "256" >
< details >
< summary > HED Boundary Control< / summary >
```bash
imagine --control-image dog.jpg --control-mode hed "photo of a dalmation"
```
< / details >
< p float = "left" >
< img src = "assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg" height = "256" >
< img src = "assets/dog-hed-boundary.jpg" height = "256" >
< img src = "assets/dog-hed-boundary-dalmation.jpg" height = "256" >
< / p >
< details >
< summary > Depth Map Control< / summary >
```bash
imagine --control-image fancy-living.jpg --control-mode depth "a modern living room"
```
< / details >
< p float = "left" >
< img src = "assets/fancy-living.jpg" height = "256" >
< img src = "assets/fancy-living-depth.jpg" height = "256" >
< img src = "assets/fancy-living-depth-generated.jpg" height = "256" >
< / p >
### 🎉 Edit Images with Instructions alone! [by InstructPix2Pix ](https://github.com/timothybrooks/instruct-pix2pix )
Just tell imaginairy how to edit the image and it will do it for you!
Use prompt strength to control how strong the edit is. For extra control you can combine
with prompt-based masking.
< details >
< summary > Normal Map Control< / summary >
```bash
imagine --control-image bird.jpg --control-mode normal "a bird"
```
< / details >
< p float = "left" >
< img src = "assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg" height = "256" >
< img src = "assets/bird-normal.jpg" height = "256" >
< img src = "assets/bird-normal-generated.jpg" height = "256" >
< / p >
### Instruction based image edits [by InstructPix2Pix ](https://github.com/timothybrooks/instruct-pix2pix )
Just tell imaginairy how to edit the image and it will do it for you!
< p float = "left" >
< img src = "assets/scenic_landscape_winter.jpg" height = "256" >
< img src = "assets/dog_red.jpg" height = "256" >
< img src = "assets/bowl_of_fruit_strawberries.jpg" height = "256" >
< img src = "assets/freckled_woman_cyborg.jpg" height = "256" >
< img src = "assets/014214_51293814_kdpmpp2m30_PS10.0_img2img-1.0_make_the_bird_wear_a_cowboy_hat_[generated].jpg" height = "256" >
< img src = "assets/flower-make-the-flower-out-of-paper-origami.gif" height = "256" >
< img src = "assets/girl-pearl-clown-compare.gif" height = "256" >
< img src = "assets/mona-lisa-headshot-anim.gif" height = "256" >
< img src = "assets/make-it-night-time.gif" height = "256" >
< / p >
< details >
< summary > Click to see shell commands< / summary >
Use prompt strength to control how strong the edit is. For extra control you can combine with prompt-based masking.
```bash
# enter imaginairy shell
>> aimg
🤖🧠> edit scenic_landscape.jpg -p "make it winter" --prompt-strength 20
🤖🧠> edit scenic_landscape.jpg -p "make it winter" --steps 30 --arg-schedule "prompt_strength[2:25:0.5]" --compilation-anim gif
🤖🧠> edit dog.jpg -p "make the dog red" --prompt-strength 5
🤖🧠> edit bowl_of_fruit.jpg -p "replace the fruit with strawberries"
🤖🧠> edit freckled_woman.jpg -p "make her a cyborg" --prompt-strength 13
🤖🧠> edit bluebird.jpg -p "make the bird wear a cowboy hat" --prompt-strength 10
🤖🧠> edit flower.jpg -p "make the flower out of paper origami" --arg-schedule prompt-strength[1:11:0.3] --steps 25 --compilation-anim gif
# create a comparison gif
🤖🧠> edit pearl_girl.jpg -p "make her wear clown makeup" --compare-gif
# create an animation showing the edit with increasing prompt strengths
🤖🧠> edit mona-lisa.jpg -p "make it a color professional photo headshot" --negative-prompt "old, ugly, blurry" --arg-schedule "prompt-strength[2:8:0.5]" --compilation-anim gif
🤖🧠> edit gg-bridge.jpg -p "make it night time" --prompt-strength 15 --steps 30 --arg-schedule prompt-strength[1:15:1] --compilation-anim gif
```
< / details >
< img src = "assets/scenic_landscape_winter.jpg" height = "256" > < img src = "assets/dog_red.jpg" height = "256" > < br >
< img src = "assets/bowl_of_fruit_strawberries.jpg" height = "256" > < img src = "assets/freckled_woman_cyborg.jpg" height = "256" > < br >
< img src = "assets/girl-pearl-clown-compare.gif" height = "256" > < img src = "assets/mona-lisa-headshot-anim.gif" height = "256" > < br >
### Quick Image Edit Demo
Want just quickly have some fun? Try `edit-demo` to apply some pre-defined edits.
```bash
>> aimg edit-demo pearl_girl.jpg
>> aimg edit-demo mona-lisa.jpg
>> aimg edit-demo luke.jpg
>> aimg edit-demo spock.jpg
```
< img src = "assets/girl_with_a_pearl_earring_suprise.gif" height = "256" > < img src = "assets/mona-lisa-suprise.gif" height = "256" > < br >
< img src = "assets/luke-suprise.gif" height = "256" > < img src = "assets/spock-suprise.gif" height = "256" > < br >
< img src = "assets/gg-bridge-suprise.gif" height = "256" > < img src = "assets/shire-suprise.gif" height = "256" > < br >
< p float = "left" >
< img src = "assets/girl_with_a_pearl_earring_suprise.gif" height = "256" >
< img src = "assets/mona-lisa-suprise.gif" height = "256" >
< img src = "assets/luke-suprise.gif" height = "256" >
< img src = "assets/spock-suprise.gif" height = "256" >
< img src = "assets/gg-bridge-suprise.gif" height = "256" >
< img src = "assets/shire-suprise.gif" height = "256" >
< / p >
### Prompt Based Masking [by clipseg ](https://github.com/timojl/clipseg )
@ -167,10 +245,11 @@ Use depth maps for amazing "translations" of existing images.
```bash
>> imagine --model SD-2.0-depth --init-image girl_with_a_pearl_earring_large.jpg --init-image-strength 0.05 "professional headshot photo of a woman with a pearl earring" -r 4 -w 1024 -h 1024 --steps 50
```
< img src = "https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/tests/data/girl_with_a_pearl_earring.jpg" height = "256" > ➡️
< img src = "assets/pearl_depth_1.jpg" height = "512" >
< img src = "assets/pearl_depth_2.jpg" height = "512" >
< img src = "assets/pearl_depth_3.jpg" height = "512" >
< p float = "left" >
< img src = "tests/data/girl_with_a_pearl_earring.jpg" width = "256" > ➡️
< img src = "assets/pearl_depth_1.jpg" width = "256" >
< img src = "assets/pearl_depth_2.jpg" width = "256" >
< / p >
### Outpainting
@ -179,9 +258,28 @@ Given a starting image, one can generate it's "surroundings".
Example:
`imagine --init-image pearl-earring.jpg --init-image-strength 0 --outpaint all250,up0,down600 "woman standing"`
< img src = "https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/tests/data/girl_with_a_pearl_earring.jpg" height = "256" > ➡️
< img src = "https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/tests/expected_output/test_outpainting_outpaint_.png" height = "256" >
### Work with different generation models
< p float = "left" >
< img src = "assets/fairytale-treehouse-sd14.jpg" height = "256" >
< img src = "assets/fairytale-treehouse-sd15.jpg" height = "256" >
< img src = "assets/fairytale-treehouse-sd20.jpg" height = "256" >
< img src = "assets/fairytale-treehouse-sd21.jpg" height = "256" >
< img src = "assets/fairytale-treehouse-openjourney-v1.jpg" height = "256" >
< img src = "assets/fairytale-treehouse-openjourney-v2.jpg" height = "256" >
< / p >
< details >
< summary > Click to see shell command< / summary >
```bash
imagine "valley, fairytale treehouse village covered, , matte painting, highly detailed, dynamic lighting, cinematic, realism, realistic, photo real, sunset, detailed, high contrast, denoised, centered, michael whelan" --steps 60 --seed 1 --arg-schedule model[sd14,sd15,sd20,sd21,openjourney-v1,openjourney-v2] --arg-schedule "caption-text[sd14,sd15,sd20,sd21,openjourney-v1,openjourney-v2]"
```
< / details >
### Prompt Expansion
You can use `{}` to randomly pull values from lists. A list of values separated by `|`
@ -218,21 +316,16 @@ You can use `{}` to randomly pull values from lists. A list of values separated
a bowl full of gold bars sitting on a table
```
## Features
- It makes images from text descriptions! 🎉
### Additional Features
- Generate images either in code or from command line.
- It just works. Proper requirements are installed. m odel weights are automatically downloaded. No huggingface account needed.
- It just works. Proper requirements are installed. Model weights are automatically downloaded. No huggingface account needed.
(if you have the right hardware... and aren't on windows)
- No more distorted faces!
- Noisy logs are gone (which was surprisingly hard to accomplish)
- WeightedPrompts let you smash together separate prompts (cat-dog)
- Tile Mode creates tileable images
- Prompt metadata saved into image file metadata
- Edit images by describing the part you want edited (see example above)
- Have AI generate captions for images `aimg describe <filename-or-url>`
- Interactive prompt: just run `aimg`
- 🎉 finetune your own image model. kind of like dreambooth. Read instructions on ["Concept Training" ](docs/concept-training.md ) page
- finetune your own image model. kind of like dreambooth. Read instructions on ["Concept Training" ](docs/concept-training.md ) page
## How To
@ -581,6 +674,10 @@ would be uncorrelated to the rest of the surrounding image. It created terrible
- ✅ add k-diffusion sampling methods
- ✅ tiling
- ✅ generation videos/gifs
- ✅ controlnet
- scribbles input
- segmentation input
- mlsd input
- [Attend and Excite ](https://attendandexcite.github.io/Attend-and-Excite/ )
- Compositional Visual Generation
- https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch