mirror of https://github.com/brycedrennan/imaginAIry synced 2024-11-05 12:00:15 +00:00

Go to file

Bryce 4ba1965db8 feature: k-diff sampler img2img and masking		2022-10-14 03:23:16 -07:00
.github	ci: faster linting?	2022-10-11 01:06:24 -05:00
assets	feature: prompt expansion (#51 )	2022-10-08 18:34:35 -07:00
docs	Update emad-qa-2020-10-10.md	2022-10-12 10:40:02 -07:00
imaginairy	feature: k-diff sampler img2img and masking	2022-10-14 03:23:16 -07:00
scripts	feature: prompt expansion (#51 )	2022-10-08 18:34:35 -07:00
tests	feature: k-diff sampler img2img and masking	2022-10-14 03:23:16 -07:00
.dockerignore	tests: add docker image for testing environment. minor test improvements	2022-09-21 22:38:44 -07:00
.gitignore	fix: inpainting producing blurry images	2022-09-27 17:19:25 -07:00
Dockerfile	feature: urls as init images	2022-09-15 23:06:59 -07:00
LICENSE	docs: update readme. add docs to package	2022-09-11 21:36:14 -07:00
Makefile	feature: prompt expansion (#51 )	2022-10-08 18:34:35 -07:00
README.md	feature: k-diff sampler img2img and masking	2022-10-14 03:23:16 -07:00
requirements-dev.in	tests: tests can run without any network calls	2022-10-11 01:06:24 -05:00
requirements-dev.txt	tests: tests can run without any network calls	2022-10-11 01:06:24 -05:00
setup.py	fix: k-sampler methods were broken	2022-10-12 20:35:45 -07:00
STABLE_DIFFUSION_LICENSE	refactor: simplify structure	2022-09-11 00:59:03 -07:00
tox.ini	tests: add some utils tests	2022-10-07 00:02:26 -05:00

README.md

ImaginAIry 🤖🧠

AI imagined images. Pythonic generation of stable diffusion images.

"just works" on Linux and macOS(M1) (and maybe windows?).

Examples

# on macOS, make sure rust is installed first
>> pip install imaginairy
>> imagine "a scenic landscape" "a photo of a dog" "photo of a fruit bowl" "portrait photo of a freckled woman"

Console Output

🤖🧠 received 4 prompt(s) and will repeat them 1 times to create 4 images.
Loading model onto mps backend...
Generating 🖼  : "a scenic landscape" 512x512px seed:557988237 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00,  1.36it/s]
    🖼  saved to: ./outputs/000001_557988237_PLMS40_PS7.5_a_scenic_landscape.jpg
Generating 🖼  : "a photo of a dog" 512x512px seed:277230171 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00,  1.41it/s]
    🖼  saved to: ./outputs/000002_277230171_PLMS40_PS7.5_a_photo_of_a_dog.jpg
Generating 🖼  : "photo of a fruit bowl" 512x512px seed:639753980 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00,  1.40it/s]
    🖼  saved to: ./outputs/000003_639753980_PLMS40_PS7.5_photo_of_a_fruit_bowl.jpg
Generating 🖼  : "portrait photo of a freckled woman" 512x512px seed:500686645 prompt-strength:7.5 steps:40 sampler-type:PLMS
    PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00,  1.37it/s]
    🖼  saved to: ./outputs/000004_500686645_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg

Prompt Based Editing by clipseg

Specify advanced text based masks using boolean logic and strength modifiers. Mask descriptions must be lowercase. Keywords uppercase. Valid symbols: AND, OR, NOT, (), and mask strength modifier {*1.5} where + can be any of + - * /. Single-character boolean operators also work. When writing strength modifies know that pixel values are between 0 and 1.

>> imagine \
    --init-image pearl_earring.jpg \ 
    --mask-prompt "face AND NOT (bandana OR hair OR blue fabric){*6}" \
    --mask-mode keep \
    --init-image-strength .2 \
    --fix-faces \
    "a modern female president" "a female robot" "a female doctor" "a female firefighter"

➡️

>> imagine \
    --init-image fruit-bowl.jpg \
    --mask-prompt "fruit OR fruit stem{*6}" \
    --mask-mode replace \
    --mask-modify-original \
    --init-image-strength .1 \
    "a bowl of kittens" "a bowl of gold coins" "a bowl of popcorn" "a bowl of spaghetti"

➡️

Face Enhancement by CodeFormer

>> imagine "a couple smiling" --steps 40 --seed 1 --fix-faces

➡️

Upscaling by RealESRGAN

>> imagine "colorful smoke" --steps 40 --upscale

➡️

Tiled Images

>> imagine  "gold coins" "a lush forest" "piles of old books" leaves --tile

Image-to-Image

>> imagine "portrait of a smiling lady. oil painting" --init-image girl_with_a_pearl_earring.jpg

➡️

Prompt Expansion

You can use {} to randomly pull values from lists. A list of values separated by | and enclosed in { } will be randomly drawn from in a non-repeating fashion. Values that are surrounded by _ _ will pull from a phrase list of the same name. Folders containing .txt phraselist files may be specified via --prompt_library_path. The option may be specified multiple times. Built-in categories:

  3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-movement, 
  art-site, artist, artist-botanical, artist-surreal, aspect-ratio, bird, body-of-water, body-pose, camera-brand,
  camera-model, color, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, dinosaur, eyecolor, f-stop, 
  fantasy-creature, fantasy-setting, fish, flower, focal-length, food, fruit, games, gen-modifier, hair, hd,
  iso-stop, landscape-type, national-park, nationality, neg-weight, noun-beauty, noun-fantasy, noun-general, 
  noun-horror, occupation, photo-term, pop-culture, pop-location, punk-style, quantity, rpg-item, scenario-desc, 
  skin-color, spaceship, style, tree-species, trippy, world-heritage-site

Examples:

imagine "a {lime|blue|silver|aqua} colored dog" -r 4 --seed 0 (note that it generates a dog of each color without repetition)

imagine "a {_color_} dog" -r 4 --seed 0 will generate four, different colored dogs. The colors will be pulled from an included phraselist of colors.

imagine "a {_spaceship_|_fruit_|hot air balloon}. low-poly" -r 4 --seed 0 will generate images of spaceships or fruits or a hot air balloon

Credit to noodle-soup-prompts where most, but not all, of the wordlists originate.

Generate image captions (via BLIP)

>> aimg describe assets/mask_examples/bowl001.jpg
a bowl full of gold bars sitting on a table

Features

It makes images from text descriptions! 🎉
Generate images either in code or from command line.
It just works. Proper requirements are installed. model weights are automatically downloaded. No huggingface account needed. (if you have the right hardware... and aren't on windows)
No more distorted faces!
Noisy logs are gone (which was surprisingly hard to accomplish)
WeightedPrompts let you smash together separate prompts (cat-dog)
Tile Mode creates tileable images
Prompt metadata saved into image file metadata
Edit images by describing the part you want edited (see example above)
Have AI generate captions for images aimg describe <filename-or-url>
Interactive prompt: just run aimg

How To

For full command line instructions run aimg --help

from imaginairy import imagine, imagine_image_files, ImaginePrompt, WeightedPrompt, LazyLoadingImage

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/Thomas_Cole_-_Architect%E2%80%99s_Dream_-_Google_Art_Project.jpg/540px-Thomas_Cole_-_Architect%E2%80%99s_Dream_-_Google_Art_Project.jpg"
prompts = [
    ImaginePrompt("a scenic landscape", seed=1, upscale=True),
    ImaginePrompt("a bowl of fruit"),
    ImaginePrompt([
        WeightedPrompt("cat", weight=1),
        WeightedPrompt("dog", weight=1),
    ]),
    ImaginePrompt(
        "a spacious building", 
        init_image=LazyLoadingImage(url=url)
    ),
    ImaginePrompt(
        "a bowl of strawberries", 
        init_image=LazyLoadingImage(filepath="mypath/to/bowl_of_fruit.jpg"),
        mask_prompt="fruit OR stem{*2}",  # amplify the stem mask x2
        mask_mode="replace",
        mask_modify_original=True,
    ),
    ImaginePrompt("strawberries", tile_mode=True),
]
for result in imagine(prompts):
    # do something
    result.save("my_image.jpg")

# or

imagine_image_files(prompts, outdir="./my-art")

Requirements

~10 gb space for models to download
A decent computer with either a CUDA supported graphics card or M1 processor.
Python installed. Preferably Python 3.10.
For macOS rust and setuptools-rust must be installed to compile the tokenizer library. They can be installed via: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh and pip install setuptools-rust

Running in Docker

See example Dockerfile (works on machine where you can pass the gpu into the container)

docker build . -t imaginairy
# you really want to map the cache or you end up wasting a lot of time and space redownloading the model weights
docker run -it --gpus all -v $HOME/.cache/huggingface:/root/.cache/huggingface -v $HOME/.cache/torch:/root/.cache/torch -v `pwd`/outputs:/outputs imaginairy /bin/bash

Running on Google Colab

Example Colab

ChangeLog

3.1.0

feature: img2img/inpainting supported on all samplers
refactor: consolidates img2img/txt2img code. consolidates schedules. consolidates masking
ci: minor logging improvements

3.0.1

fix: k-samplers were broken

3.0.0

feature: improved safety filter

2.4.0

🎉 feature: prompt expansion
feature: make (blip) photo captions more descriptive

2.3.1

fix: face fidelity default was broken

2.3.0

feature: model weights file can be specified via --model-weights-path argument at the command line
fix: set face fidelity default back to old value
fix: handle small images without throwing exception. credit to @NiclasEriksen
docs: add setuptools-rust as dependency for macos

2.2.1

fix: init image is fully ignored if init-image-strength = 0

2.2.0

feature: face enhancement fidelity is now configurable

2.1.0

improved masking accuracy from clipseg

2.0.3

fix memory leak in face enhancer
fix blurry inpainting
fix for pillow compatibility

2.0.0

🎉 fix: inpainted areas correlate with surrounding image, even at 100% generation strength. Previously if the generation strength was high enough the generated image would be uncorrelated to the rest of the surrounding image. It created terrible looking images.
🎉 feature: interactive prompt added. access by running aimg
🎉 feature: Specify advanced text based masks using boolean logic and strength modifiers. Mask descriptions must be lowercase. Keywords uppercase. Valid symbols: AND, OR, NOT, (), and mask strength modifier {+0.1} where + can be any of + - * /. Single character boolean operators also work (|, &, !)
🎉 feature: apply mask edits to original files with mask_modify_original (on by default)
feature: auto-rotate images if exif data specifies to do so
fix: mask boundaries are more accurate
fix: accept mask images in command line
fix: img2img algorithm was wrong and wouldn't at values close to 0 or 1

1.6.2

fix: another bfloat16 fix

1.6.1

fix: make sure image tensors come to the CPU as float32 so there aren't compatability issues with non-bfloat16 cpus

1.6.0

fix: maybe address #13 with expected scalar type BFloat16 but found Float
- at minimum one can specify --precision full now and that will probably fix the issue
feature: tile mode can now be specified per-prompt

1.5.3

fix: missing config file for describe feature

1.5.1

img2img now supported with PLMS (instead of just DDIM)
added image captioning feature aimg describe dog.jpg => a brown dog sitting on grass
added new commandline tool aimg for additional image manipulation functionality

1.4.0

support multiple additive targets for masking with | symbol. Example: "fruit|stem|fruit stem"

1.3.0

added prompt based image editing. Example: "fruit => gold coins"
test coverage improved

1.2.0

allow urls as init-images

previous

img2img actually does # of steps you specify
performance optimizations
numerous other changes

Not Supported

a GUI. this is a python library
training
exploratory features that don't work well

Todo

Performance Optimizations
Development Environment
- ✅ add tests
- ✅ set up ci (test/lint/format)
- ✅ unified pipeline (txt2img & img2img combined)
- setup parallel testing
- add docs
- remove yaml config
- delete more unused code
Interface improvements
- ✅ init-image at command line
- ✅ prompt expansion
- ✅ interactive cli
Image Generation Features
- ✅ add k-diffusion sampling methods
- ✅ tiling
- generation videos/gifs
- Compositional Visual Generation
  - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch
  - https://colab.research.google.com/github/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/blob/main/notebooks/demo.ipynb#scrollTo=wt_j3uXZGFAS
- negative prompting
  - some syntax to allow it in a text string
- images as actual prompts instead of just init images. is this the same as textual inversion?
  - requires model fine-tuning since SD1.4 expects 77x768 text encoding input
  - https://twitter.com/Buntworthy/status/1566744186153484288
  - https://github.com/justinpinkney/stable-diffusion
  - https://github.com/LambdaLabsML/lambda-diffusers
  - https://www.reddit.com/r/MachineLearning/comments/x6k5bm/n_stable_diffusion_image_variations_released/
Image Editing
- outpainting
  - https://github.com/parlance-zz/g-diffuser-bot/search?q=noise&type=issues
  - lama cleaner
- ✅ inpainting
- ✅ text based image masking
  - ✅ ClipSeg - https://github.com/timojl/clipseg
  - https://github.com/facebookresearch/detectron2
Image Enhancement
- Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
  - ✅ realesrgan
  - ldm
  - https://github.com/lowfuel/progrock-stable
  - gobig
  - stable super-res?
    - todo: try with 1-0-0-0 mask at full image resolution (rencoding entire image+predicted image at every step)
    - todo: use a gaussian pyramid and only include the "high-detail" level of the pyramid into the next step
    - https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
- ✅ face enhancers
  - ✅ gfpgan - https://github.com/TencentARC/GFPGAN
  - ✅ codeformer - https://github.com/sczhou/CodeFormer
- ✅ image describe feature -
  - ✅ https://github.com/salesforce/BLIP
  - 🚫 CLIP brute-force prompt reconstruction
    - The accuracy of this approach is too low for me to include it in imaginAIry
    - https://github.com/rmokady/CLIP_prefix_caption
    - https://github.com/pharmapsychotic/clip-interrogator (blip + clip)
  - https://github.com/KaiyangZhou/CoOp
- 🚫 CPU support. While the code does actually work on some CPUs, the generation takes so long that I don't think it's worth the effort to support this feature
- ✅ img2img for plms
- ✅ img2img for kdiff functions
Other

Noteable Stable Diffusion Implementations

Online Stable Diffusion Services

https://stablecog.com/