imaginAIry/README.md
Bryce b69b4c770e feature: interactive prompt
- add quiet flag
- add mask-modify-original flag
2022-09-24 23:31:03 -07:00

19 KiB

ImaginAIry 🤖🧠

AI imagined images. Pythonic generation of stable diffusion images.

"just works" on Linux and macOS(M1) (and maybe windows?).

Examples

# on macOS, make sure rust is installed first
>> pip install imaginairy
>> imagine "a scenic landscape" "a photo of a dog" "photo of a fruit bowl" "portrait photo of a freckled woman"
Console Output
🤖🧠 received 4 prompt(s) and will repeat them 1 times to create 4 images.
Loading model onto mps backend...
Generating 🖼  : "a scenic landscape" 512x512px seed:557988237 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00,  1.36it/s]
    🖼  saved to: ./outputs/000001_557988237_PLMS40_PS7.5_a_scenic_landscape.jpg
Generating 🖼  : "a photo of a dog" 512x512px seed:277230171 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00,  1.41it/s]
    🖼  saved to: ./outputs/000002_277230171_PLMS40_PS7.5_a_photo_of_a_dog.jpg
Generating 🖼  : "photo of a fruit bowl" 512x512px seed:639753980 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00,  1.40it/s]
    🖼  saved to: ./outputs/000003_639753980_PLMS40_PS7.5_photo_of_a_fruit_bowl.jpg
Generating 🖼  : "portrait photo of a freckled woman" 512x512px seed:500686645 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00,  1.37it/s]
    🖼  saved to: ./outputs/000004_500686645_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg


Prompt Based Editing by clipseg

Specify advanced text based masks using boolean logic and strength modifiers. Mask descriptions must be lowercase. Keywords uppercase. Valid symbols: AND, OR, NOT, (), and mask strength modifier {*1.5} where + can be any of + - * /. Single-character boolean operators also work. When writing strength modifies know that pixel values are between 0 and 1.

>> imagine \
    --init-image pearl_earring.jpg \ 
    --mask-prompt "face{*1.9}" \
    --mask-mode keep \
    --init-image-strength .4 \
    "a female doctor" "an elegant woman"

➡️

>> imagine \
    --init-image fruit-bowl.jpg \
    --mask-prompt "fruit OR fruit stem{*1.5}" \
    --mask-mode replace \
    --init-image-strength .1 \
    "a bowl of kittens" "a bowl of gold coins" "a bowl of popcorn" "a bowl of spaghetti"

➡️

Face Enhancement by CodeFormer

>> imagine "a couple smiling" --steps 40 --seed 1 --fix-faces

➡️

Upscaling by RealESRGAN

>> imagine "colorful smoke" --steps 40 --upscale

➡️

Tiled Images

>> imagine  "gold coins" "a lush forest" "piles of old books" leaves --tile


Image-to-Image

>> imagine "portrait of a smiling lady. oil painting" --init-image girl_with_a_pearl_earring.jpg

➡️

Generate image captions

>> aimg describe assets/mask_examples/bowl001.jpg
a bowl full of gold bars sitting on a table

Features

  • It makes images from text descriptions! 🎉
  • Generate images either in code or from command line.
  • It just works. Proper requirements are installed. model weights are automatically downloaded. No huggingface account needed. (if you have the right hardware... and aren't on windows)
  • No more distorted faces!
  • Noisy logs are gone (which was surprisingly hard to accomplish)
  • WeightedPrompts let you smash together separate prompts (cat-dog)
  • Tile Mode creates tileable images
  • Prompt metadata saved into image file metadata
  • Edit images by describing the part you want edited (see example above)
  • Have AI generate captions for images aimg describe <filename-or-url>
  • Interactive prompt: just run aimg

How To

For full command line instructions run aimg --help

from imaginairy import imagine, imagine_image_files, ImaginePrompt, WeightedPrompt, LazyLoadingImage

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/Thomas_Cole_-_Architect%E2%80%99s_Dream_-_Google_Art_Project.jpg/540px-Thomas_Cole_-_Architect%E2%80%99s_Dream_-_Google_Art_Project.jpg"
prompts = [
    ImaginePrompt("a scenic landscape", seed=1, upscale=True),
    ImaginePrompt("a bowl of fruit"),
    ImaginePrompt([
        WeightedPrompt("cat", weight=1),
        WeightedPrompt("dog", weight=1),
    ]),
    ImaginePrompt(
        "a spacious building", 
        init_image=LazyLoadingImage(url=url)
    ),
    ImaginePrompt(
        "a bowl of strawberries", 
        init_image=LazyLoadingImage(filepath="mypath/to/bowl_of_fruit.jpg"),
        mask_prompt="fruit OR stem{*2}",  # amplify the stem mask x2
        mask_mode="replace",
    ),
    ImaginePrompt("strawberries", tile_mode=True),
]
for result in imagine(prompts):
    # do something
    result.save("my_image.jpg")

# or

imagine_image_files(prompts, outdir="./my-art")

Requirements

  • ~10 gb space for models to download
  • A decent computer with either a CUDA supported graphics card or M1 processor.
  • Python installed. Preferably Python 3.10.
  • For macOS rust must be installed to compile the tokenizer library. be installed via: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Running in Docker

See example Dockerfile (works on machine where you can pass the gpu into the container)

docker build . -t imaginairy
# you really want to map the cache or you end up wasting a lot of time and space redownloading the model weights
docker run -it --gpus all -v $HOME/.cache/huggingface:/root/.cache/huggingface -v $HOME/.cache/torch:/root/.cache/torch -v `pwd`/outputs:/outputs imaginairy /bin/bash

Running on Google Colab

Example Colab

ChangeLog

  • feature: interactive prompt added. access by running aimg
  • feature: Specify advanced text based masks using boolean logic and strength modifiers. Mask descriptions must be lowercase. Keywords uppercase. Valid symbols: AND, OR, NOT, (), and mask strength modifier {+0.1} where + can be any of + - * /
  • feature: apply mask edits to original files
  • feature: auto-rotate images if exif data specifies to do so
  • fix: accept mask images in command line

1.6.2

  • fix: another bfloat16 fix

1.6.1

  • fix: make sure image tensors come to the CPU as float32 so there aren't compatability issues with non-bfloat16 cpus

1.6.0

  • fix: maybe address #13 with expected scalar type BFloat16 but found Float
    • at minimum one can specify --precision full now and that will probably fix the issue
  • feature: tile mode can now be specified per-prompt

1.5.3

  • fix: missing config file for describe feature

1.5.1

  • img2img now supported with PLMS (instead of just DDIM)
  • added image captioning feature aimg describe dog.jpg => a brown dog sitting on grass
  • added new commandline tool aimg for additional image manipulation functionality

1.4.0

  • support multiple additive targets for masking with | symbol. Example: "fruit|stem|fruit stem"

1.3.0

  • added prompt based image editing. Example: "fruit => gold coins"
  • test coverage improved

1.2.0

  • allow urls as init-images

** previous **

  • img2img actually does # of steps you specify
  • performance optimizations
  • numerous other changes

Models Used

Not Supported

  • a web interface. this is a python library
  • training

Todo

Noteable Stable Diffusion Implementations

Further Reading