docs: update docs

2024-10-31 03:20:40 +00:00 · 2023-02-21 20:41:29 -08:00 · 2023-02-21 20:41:29 -08:00 · b261c62d4e
commit b261c62d4e
parent 54c3ad51d6
28 changed files with 168 additions and 52 deletions
--- a/README.md
+++ b/README.md
@ -9,78 +9,156 @@ AI imagined images. Pythonic generation of stable diffusion images.

 "just works" on Linux and macOS(M1) (and maybe windows?).

-## Examples
+
 ```bash
 # on macOS, make sure rust is installed first
 >> pip install imaginairy
->> imagine "a scenic landscape" "a photo of a dog" "photo of a fruit bowl" "portrait photo of a freckled woman"
-# Stable Diffusion 2.1
->> imagine --model SD-2.1 "a forest"
-# Make generation gif
+>> imagine "a scenic landscape" "a photo of a dog" "photo of a fruit bowl" "portrait photo of a freckled woman" "a bluejay"
+# Make an animation showing the generation process
 >> imagine --gif "a flower"
-
 ```
+<p float="left">
+<img src="assets/000019_786355545_PLMS50_PS7.5_a_scenic_landscape.jpg" height="256">
+<img src="assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg" height="256">
+<img src="assets/000056_293284644_PLMS40_PS7.5_photo_of_a_bowl_of_fruit.jpg" height="256">
+<img src="assets/000078_260972468_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg" height="256">
+<img src="assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg" height="256">
+<img src="assets/009719_942389026_kdpmpp2m15_PS7.5_a_flower.gif" height="256">
+</p>

-<details closed>
-<summary>Console Output</summary>
+
+## Features
+ - Instruction based image edits (InstructPix2Pix)
+ - Control image generation structure (ControlNet)
+ - Seamless tiled images
+ - Text-based masking (clipseg)
+ - Face enhancement (CodeFormer)
+ - Upscaling 
+ - Outpainting
+ - Prompt expansion
+ - Image captioning
+ - Different generation models
+
+### Image Structure Control [by ControlNet](https://github.com/lllyasviel/ControlNet)
+Generate images guided by body poses, depth maps, canny edges, hed boundaries, or normal maps.
+
+<details>
+    <summary>Openpose Control</summary>

 ```bash
-🤖🧠 received 4 prompt(s) and will repeat them 1 times to create 4 images.
-Loading model onto mps backend...
-Generating 🖼  : "a scenic landscape" 512x512px seed:557988237 prompt-strength:7.5 steps:40 sampler-type:PLMS
-    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00,  1.36it/s]
-    🖼  saved to: ./outputs/000001_557988237_PLMS40_PS7.5_a_scenic_landscape.jpg
-Generating 🖼  : "a photo of a dog" 512x512px seed:277230171 prompt-strength:7.5 steps:40 sampler-type:PLMS
-    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00,  1.41it/s]
-    🖼  saved to: ./outputs/000002_277230171_PLMS40_PS7.5_a_photo_of_a_dog.jpg
-Generating 🖼  : "photo of a fruit bowl" 512x512px seed:639753980 prompt-strength:7.5 steps:40 sampler-type:PLMS
-    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00,  1.40it/s]
-    🖼  saved to: ./outputs/000003_639753980_PLMS40_PS7.5_photo_of_a_fruit_bowl.jpg
-Generating 🖼  : "portrait photo of a freckled woman" 512x512px seed:500686645 prompt-strength:7.5 steps:40 sampler-type:PLMS
-    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00,  1.37it/s]
-    🖼  saved to: ./outputs/000004_500686645_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg
+imagine --control-image assets/indiana.jpg  --control-mode openpose --caption-text openpose "photo of a polar bear"
 ```
 </details>
+<p float="left">
+    <img src="assets/indiana.jpg" height="256">
+    <img src="assets/indiana-pose.jpg" height="256">
+    <img src="assets/indiana-pose-polar-bear.jpg" height="256">
+</p>

-<img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000019_786355545_PLMS50_PS7.5_a_scenic_landscape.jpg" height="256"><img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg"  height="256"><br>
-<img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000056_293284644_PLMS40_PS7.5_photo_of_a_bowl_of_fruit.jpg" height="256"><img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000078_260972468_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg"  height="256"><br>
-<img src="assets/009719_942389026_kdpmpp2m15_PS7.5_a_flower.gif" height="256">
+<details>
+    <summary>Canny Edge Control</summary>
+
+```bash
+imagine --control-image assets/lena.png  --control-mode canny --caption-text canny "photo of a woman with a hat looking at the camera"
+```
+</details>
+<p float="left">
+    <img src="assets/lena.png" height="256">
+    <img src="assets/lena-canny.jpg" height="256">
+    <img src="assets/lena-canny-generated.jpg" height="256">
+</p>
+
+<details>
+    <summary>HED Boundary Control</summary>
+
+```bash
+imagine --control-image dog.jpg  --control-mode hed  "photo of a dalmation"
+```
+</details>
+<p float="left">
+    <img src="assets/000032_337692011_PLMS40_PS7.5_a_photo_of_a_dog.jpg" height="256">
+    <img src="assets/dog-hed-boundary.jpg" height="256">
+    <img src="assets/dog-hed-boundary-dalmation.jpg" height="256">
+</p>
+
+<details>
+    <summary>Depth Map Control</summary>
+
+```bash
+imagine --control-image fancy-living.jpg  --control-mode depth  "a modern living room"
+```
+</details>
+<p float="left">
+    <img src="assets/fancy-living.jpg" height="256">
+    <img src="assets/fancy-living-depth.jpg" height="256">
+    <img src="assets/fancy-living-depth-generated.jpg" height="256">
+</p>
+
+<details>
+    <summary>Normal Map Control</summary>
+
+```bash
+imagine --control-image bird.jpg  --control-mode normal  "a bird"
+```
+</details>
+<p float="left">
+    <img src="assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg" height="256">
+    <img src="assets/bird-normal.jpg" height="256">
+    <img src="assets/bird-normal-generated.jpg" height="256">
+</p>


-###  🎉 Edit Images with Instructions alone! [by InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix)
-Just tell imaginairy how to edit the image and it will do it for you!  
-Use prompt strength to control how strong the edit is. For extra control you can combine
-with prompt-based masking.
+###  Instruction based image edits [by InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix)
+Just tell imaginairy how to edit the image and it will do it for you!
+<p float="left">
+<img src="assets/scenic_landscape_winter.jpg" height="256">
+<img src="assets/dog_red.jpg" height="256">
+<img src="assets/bowl_of_fruit_strawberries.jpg" height="256">
+<img src="assets/freckled_woman_cyborg.jpg" height="256">
+<img src="assets/014214_51293814_kdpmpp2m30_PS10.0_img2img-1.0_make_the_bird_wear_a_cowboy_hat_[generated].jpg" height="256">
+<img src="assets/flower-make-the-flower-out-of-paper-origami.gif" height="256">
+<img src="assets/girl-pearl-clown-compare.gif" height="256">
+<img src="assets/mona-lisa-headshot-anim.gif" height="256">
+<img src="assets/make-it-night-time.gif" height="256">
+</p>
+
+<details>
+<summary>Click to see shell commands</summary>
+Use prompt strength to control how strong the edit is. For extra control you can combine with prompt-based masking.

 ```bash
 # enter imaginairy shell
 >> aimg
 🤖🧠> edit scenic_landscape.jpg -p "make it winter" --prompt-strength 20
-🤖🧠> edit scenic_landscape.jpg -p "make it winter" --steps 30 --arg-schedule "prompt_strength[2:25:0.5]" --compilation-anim gif
 🤖🧠> edit dog.jpg -p "make the dog red" --prompt-strength 5
 🤖🧠> edit bowl_of_fruit.jpg -p "replace the fruit with strawberries"
 🤖🧠> edit freckled_woman.jpg -p "make her a cyborg" --prompt-strength 13
+🤖🧠> edit bluebird.jpg -p "make the bird wear a cowboy hat" --prompt-strength 10
+🤖🧠> edit flower.jpg -p "make the flower out of paper origami" --arg-schedule prompt-strength[1:11:0.3]  --steps 25 --compilation-anim gif
+
 # create a comparison gif
 🤖🧠> edit pearl_girl.jpg -p "make her wear clown makeup" --compare-gif
 # create an animation showing the edit with increasing prompt strengths
 🤖🧠> edit mona-lisa.jpg -p "make it a color professional photo headshot" --negative-prompt "old, ugly, blurry" --arg-schedule "prompt-strength[2:8:0.5]" --compilation-anim gif
+🤖🧠> edit gg-bridge.jpg -p "make it night time" --prompt-strength 15  --steps 30 --arg-schedule prompt-strength[1:15:1] --compilation-anim gif
 ```
+</details>


-<img src="assets/scenic_landscape_winter.jpg" height="256"><img src="assets/dog_red.jpg" height="256"><br>
-<img src="assets/bowl_of_fruit_strawberries.jpg" height="256"><img src="assets/freckled_woman_cyborg.jpg" height="256"><br>
-<img src="assets/girl-pearl-clown-compare.gif" height="256"><img src="assets/mona-lisa-headshot-anim.gif" height="256"><br>

+### Quick Image Edit Demo
 Want just quickly have some fun? Try `edit-demo` to apply some pre-defined edits.
 ```bash
 >> aimg edit-demo pearl_girl.jpg
->> aimg edit-demo mona-lisa.jpg
->> aimg edit-demo luke.jpg
->> aimg edit-demo spock.jpg
 ```
-<img src="assets/girl_with_a_pearl_earring_suprise.gif" height="256"><img src="assets/mona-lisa-suprise.gif" height="256"><br>
-<img src="assets/luke-suprise.gif" height="256"><img src="assets/spock-suprise.gif" height="256"><br>
-<img src="assets/gg-bridge-suprise.gif" height="256"><img src="assets/shire-suprise.gif" height="256"><br>
+<p float="left">
+<img src="assets/girl_with_a_pearl_earring_suprise.gif" height="256">
+<img src="assets/mona-lisa-suprise.gif" height="256">
+<img src="assets/luke-suprise.gif" height="256">
+<img src="assets/spock-suprise.gif" height="256">
+<img src="assets/gg-bridge-suprise.gif" height="256">
+<img src="assets/shire-suprise.gif" height="256">
+</p>


 ### Prompt Based Masking  [by clipseg](https://github.com/timojl/clipseg)
@ -167,10 +245,11 @@ Use depth maps for amazing "translations" of existing images.
 ```bash
 >> imagine --model SD-2.0-depth --init-image girl_with_a_pearl_earring_large.jpg --init-image-strength 0.05  "professional headshot photo of a woman with a pearl earring" -r 4 -w 1024 -h 1024 --steps 50
 ```
-<img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/tests/data/girl_with_a_pearl_earring.jpg" height="256"> ➡️ 
-<img src="assets/pearl_depth_1.jpg" height="512">
-<img src="assets/pearl_depth_2.jpg" height="512"> 
-<img src="assets/pearl_depth_3.jpg" height="512">
+<p float="left">
+<img src="tests/data/girl_with_a_pearl_earring.jpg" width="256"> ➡️ 
+<img src="assets/pearl_depth_1.jpg" width="256">
+<img src="assets/pearl_depth_2.jpg" width="256">
+</p>


 ### Outpainting
@ -179,9 +258,28 @@ Given a starting image, one can generate it's "surroundings".

 Example:
 `imagine --init-image pearl-earring.jpg --init-image-strength 0 --outpaint all250,up0,down600 "woman standing"`
+
 <img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/tests/data/girl_with_a_pearl_earring.jpg" height="256"> ➡️ 
 <img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/tests/expected_output/test_outpainting_outpaint_.png" height="256">

+### Work with different generation models
+
+<p float="left">
+    <img src="assets/fairytale-treehouse-sd14.jpg" height="256">
+    <img src="assets/fairytale-treehouse-sd15.jpg" height="256">
+    <img src="assets/fairytale-treehouse-sd20.jpg" height="256">
+    <img src="assets/fairytale-treehouse-sd21.jpg" height="256">
+    <img src="assets/fairytale-treehouse-openjourney-v1.jpg" height="256">
+    <img src="assets/fairytale-treehouse-openjourney-v2.jpg" height="256">
+</p>
+
+<details>
+<summary>Click to see shell command</summary>
+
+```bash
+imagine "valley, fairytale treehouse village covered, , matte painting, highly detailed, dynamic lighting, cinematic, realism, realistic, photo real, sunset, detailed, high contrast, denoised, centered, michael whelan" --steps 60 --seed 1 --arg-schedule model[sd14,sd15,sd20,sd21,openjourney-v1,openjourney-v2] --arg-schedule "caption-text[sd14,sd15,sd20,sd21,openjourney-v1,openjourney-v2]"
+```
+</details>

 ### Prompt Expansion
 You can use `{}` to randomly pull values from lists.  A list of values separated by `|` 
@ -218,21 +316,16 @@ You can use `{}` to randomly pull values from lists.  A list of values separated
 a bowl full of gold bars sitting on a table
 ```

-## Features
- 
- - It makes images from text descriptions! 🎉
+### Additional Features
 - Generate images either in code or from command line.
- - It just works. Proper requirements are installed. model weights are automatically downloaded. No huggingface account needed. 
+ - It just works. Proper requirements are installed. Model weights are automatically downloaded. No huggingface account needed. 
    (if you have the right hardware... and aren't on windows)
- - No more distorted faces!
 - Noisy logs are gone (which was surprisingly hard to accomplish)
 - WeightedPrompts let you smash together separate prompts (cat-dog)
- - Tile Mode creates tileable images
 - Prompt metadata saved into image file metadata
- - Edit images by describing the part you want edited (see example above)
 - Have AI generate captions for images `aimg describe <filename-or-url>`
 - Interactive prompt: just run `aimg`
- - 🎉 finetune your own image model. kind of like dreambooth. Read instructions on ["Concept Training"](docs/concept-training.md) page
+ - finetune your own image model. kind of like dreambooth. Read instructions on ["Concept Training"](docs/concept-training.md) page

 ## How To

@ -581,6 +674,10 @@ would be uncorrelated to the rest of the surrounding image.  It created terrible
   - ✅ add k-diffusion sampling methods
   - ✅ tiling
   - ✅ generation videos/gifs
+   - ✅ controlnet
+     - scribbles input
+     - segmentation input
+     - mlsd input
   - [Attend and Excite](https://attendandexcite.github.io/Attend-and-Excite/)
   - Compositional Visual Generation
     - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
--- a/assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg
+++ b/assets/013986_1_kdpmpp2m59_PS7.5_a_bluejay_[generated].jpg
--- a/assets/014214_51293814_kdpmpp2m30_PS10.0_img2img-1.0_make_the_bird_wear_a_cowboy_hat_[generated].jpg
+++ b/assets/014214_51293814_kdpmpp2m30_PS10.0_img2img-1.0_make_the_bird_wear_a_cowboy_hat_[generated].jpg
--- a/assets/bird-normal-generated.jpg
+++ b/assets/bird-normal-generated.jpg
--- a/assets/bird-normal.jpg
+++ b/assets/bird-normal.jpg
--- a/assets/dog-hed-boundary-dalmation.jpg
+++ b/assets/dog-hed-boundary-dalmation.jpg
--- a/assets/dog-hed-boundary.jpg
+++ b/assets/dog-hed-boundary.jpg
--- a/assets/fairytale-treehouse-openjourney-v1.jpg
+++ b/assets/fairytale-treehouse-openjourney-v1.jpg
--- a/assets/fairytale-treehouse-openjourney-v2.jpg
+++ b/assets/fairytale-treehouse-openjourney-v2.jpg
--- a/assets/fairytale-treehouse-sd14.jpg
+++ b/assets/fairytale-treehouse-sd14.jpg
--- a/assets/fairytale-treehouse-sd15.jpg
+++ b/assets/fairytale-treehouse-sd15.jpg
--- a/assets/fairytale-treehouse-sd20.jpg
+++ b/assets/fairytale-treehouse-sd20.jpg
--- a/assets/fairytale-treehouse-sd21.jpg
+++ b/assets/fairytale-treehouse-sd21.jpg
--- a/assets/fancy-living-depth-generated.jpg
+++ b/assets/fancy-living-depth-generated.jpg
--- a/assets/fancy-living-depth.jpg
+++ b/assets/fancy-living-depth.jpg
--- a/assets/fancy-living.jpg
+++ b/assets/fancy-living.jpg
--- a/assets/flower-make-the-flower-out-of-paper-origami.gif
+++ b/assets/flower-make-the-flower-out-of-paper-origami.gif
--- a/assets/indiana-pose-polar-bear.jpg
+++ b/assets/indiana-pose-polar-bear.jpg
--- a/assets/indiana-pose.jpg
+++ b/assets/indiana-pose.jpg
--- a/assets/indiana.jpg
+++ b/assets/indiana.jpg
--- a/assets/lena-canny-generated.jpg
+++ b/assets/lena-canny-generated.jpg
--- a/assets/lena-canny.jpg
+++ b/assets/lena-canny.jpg
--- a/assets/lena.png
+++ b/assets/lena.png
--- a/assets/make-it-night-time.gif
+++ b/assets/make-it-night-time.gif
--- a/imaginairy/api.py
+++ b/imaginairy/api.py
@ -3,6 +3,7 @@ import math
 import os
 import re

+from imaginairy.img_utils import add_caption_to_image
 from imaginairy.schema import SafetyMode

 logger = logging.getLogger(__name__)
@ -571,6 +572,9 @@ def _generate_single_image(
                    mask_img=mask_image_orig,
                )

+            if prompt.caption_text:
+                add_caption_to_image(gen_img, prompt.caption_text)
+
        result = ImagineResult(
            img=gen_img,
            prompt=prompt,
--- a/imaginairy/cmds.py
+++ b/imaginairy/cmds.py
@ -247,6 +247,13 @@ common_options = [
        type=click.Choice(["gif", "mp4"]),
        help="Generate an animation composed of all the images generated in this run.  Defaults to gif but `--compilation-anim mp4` will generate an mp4 instead.",
    ),
+    click.option(
+        "--caption-text",
+        "caption_text",
+        default=None,
+        help="Specify the text to write onto the image",
+        type=str,
+    ),
 ]


@ -391,6 +398,7 @@ def imagine_cmd(
    make_compare_gif,
    arg_schedules,
    make_compilation_animation,
+    caption_text,
    control_image,
    control_image_raw,
    control_mode,
@ -440,6 +448,7 @@ def imagine_cmd(
        make_compare_gif,
        arg_schedules,
        make_compilation_animation,
+        caption_text,
        control_image,
        control_image_raw,
        control_mode,
@ -559,6 +568,7 @@ def edit_image(  # noqa
    make_compare_gif,
    arg_schedules,
    make_compilation_animation,
+    caption_text,
 ):
    """
    Edit an image via AI.
@ -611,6 +621,7 @@ def edit_image(  # noqa
        make_compare_gif,
        arg_schedules,
        make_compilation_animation,
+        caption_text,
    )


@ -654,6 +665,7 @@ def _imagine_cmd(
    make_compare_gif=False,
    arg_schedules=None,
    make_compilation_animation=False,
+    caption_text="",
    control_image=None,
    control_image_raw=None,
    control_mode="",
@ -762,6 +774,7 @@ def _imagine_cmd(
                    allow_compose_phase=allow_compose_phase,
                    model=model_weights_path,
                    model_config_path=model_config_path,
+                    caption_text=caption_text,
                )
                from imaginairy.prompt_schedules import (
                    parse_schedule_strs,
--- a/imaginairy/schema.py
+++ b/imaginairy/schema.py
@ -118,6 +118,7 @@ class ImaginePrompt:
        model_config_path=None,
        is_intermediate=False,
        collect_progress_latents=False,
+        caption_text="",
    ):
        self.prompts = prompt
        self.negative_prompt = negative_prompt
@ -146,6 +147,7 @@ class ImaginePrompt:
        self.allow_compose_phase = allow_compose_phase
        self.model = model
        self.model_config_path = model_config_path
+        self.caption_text = caption_text

        # we don't want to save intermediate images
        self.is_intermediate = is_intermediate
@ -227,7 +229,7 @@ class ImaginePrompt:
            self.steps = self.steps or SamplerCls.default_steps
            self.width = self.width or get_model_default_image_size(self.model)
            self.height = self.height or get_model_default_image_size(self.model)
-
+        self.steps = int(self.steps)
        if self.negative_prompt is None:
            model_config = config.MODEL_CONFIG_SHORTCUTS.get(self.model, None)
            if model_config:
--- a/tests/data/red.png
+++ b/tests/data/red.png