Previously the `+` and `-` characters in a mask (example: `face{+0.1}`) added to the grayscale value of any masked areas. This wasn't very useful. The new behavior is that the mask will expand or contract by the number of pixel specified. The technical terms for this are dilation and erosion. This allows much greater control over the masked area.
You can use `{}` to randomly pull values from lists. A list of values separated by `|` and enclosed in `{ }` will be randomly drawn from in a non-repeating fashion. Values that are surrounded by `_ _` will pull from a phrase list of the same name. Folders containing .txt phraselist files may be specified via
`--prompt_library_path`. The option may be specified multiple times. Built-in categories:
3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-movement,
art-site, artist, artist-botanical, artist-surreal, aspect-ratio, bird, body-of-water, body-pose, camera-brand,
camera-model, color, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, dinosaur, eyecolor, f-stop,
fantasy-creature, fantasy-setting, fish, flower, focal-length, food, fruit, games, gen-modifier, hair, hd,
iso-stop, landscape-type, national-park, nationality, neg-weight, noun-beauty, noun-fantasy, noun-general,
noun-horror, occupation, photo-term, pop-culture, pop-location, punk-style, quantity, rpg-item, scenario-desc,
skin-color, spaceship, style, tree-species, trippy, world-heritage-site
Examples:
`imagine "a {red|black} dog" -r 2 --seed 0` will generate both "a red dog" and "a black dog"
`imagine "a {_color_} dog" -r 4 --seed 0` will generate four, different colored dogs. The colors will eb pulled from an included
phraselist of colors.
`imagine "a {_spaceship_|_fruit_|hot air balloon}. low-poly" -r 4 --seed 0` will generate images of spaceships or fruits or a hot air balloon
Credit to [noodle-soup-prompts](https://github.com/WASasquatch/noodle-soup-prompts/) where most, but not all, of the wordlists originate.
If input images didn't need resizing because they were already smaller than max width/height then they didn't get normalized to a multiple of 64. This caused an exception like the following:
```Sizes of tensors must match except in dimension 1. Expected size 4 but got size 3 for tensor number 1 in the list.
```
while the previous version did produce much better blending it also makes images that lack detail for some reason.
tests: Added more tests to help catch this sort of thing earlies
fix: found that median blur is really slow, so I made sure we only do it on downsampled masks. Was taking like 3 minutes to run on the large pearl girl picture on M1
- docs: update examples
- 🎉 fix: inpainted areas correlate with surrounding image, even at 100% generation strength. Previously if the generation strength was high enough the generated image
would be uncorrelated to the rest of the surrounding image. It created terrible looking images.
- fix: mask boundaries are more accurate
Specify advanced text based masks using boolean logic and strength modifiers. Mask descriptions must be lowercase. Keywords uppercase.
Valid symbols: `AND`, `OR`, `NOT`, `()`, and mask strength modifier `{*1.5}` where `+` can be any of `+ - * /`. Single-character boolean
operators also work. When writing strength modifies know that pixel values are between 0 and 1.
- feature: apply mask edits to original files
- feature: auto-rotate images if exif data specifies to do so
- fix: accept mask images in command line
If the x_sample was a bfloat on the gpu but the cpu doesn't support bfloat, that can cause a TypeError
```
File "/home/stdiff/.local/lib/python3.10/site-packages/imaginairy/api.py", line 292, in imagine
x_sample.cpu().numpy(), "c h w -> h w c"
TypeError: Got unsupported ScalarType BFloat16`
```
Seems to be caused by incompatible types in group_norm when we use autocast.
Patch group_norm to cast the weights to the same type as the inputs
From what I can understand all the other repos just switch to full precision instead
of addressing this. I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?
https://github.com/pytorch/pytorch/pull/81852
- simpler logging suppression for `transformers` library
- suppress logging noise for running tests
- get test running for all samplers on mps and cuda platforms
- refactor safety model env variable to allow classification
- improved image logging functionality. can just stick log_latent wherever you want
- improved some variable naming
- moved all the samplers together
- vendored k-diffusion library