imaginAIry/imaginairy/enhancers/describe_image_blip.py

import os
import os.path
from functools import lru_cache

import torch
from torchvision import transforms
from torchvision.transforms.functional import InterpolationMode

from imaginairy.model_manager import get_cached_url_path
from imaginairy.utils import get_device
from imaginairy.vendored.blip.blip import BLIP_Decoder, load_checkpoint

device = get_device()
if "mps" in device:
    device = "cpu"

BLIP_EVAL_SIZE = 384


@lru_cache
def blip_model():
    from imaginairy.paths import PKG_ROOT

    config_path = os.path.join(
        PKG_ROOT, "vendored", "blip", "configs", "med_config.json"
    )
    url = "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model*_base_caption.pth"

    model = BLIP_Decoder(image_size=BLIP_EVAL_SIZE, vit="base", med_config=config_path)
    cached_url_path = get_cached_url_path(url)
    model, msg = load_checkpoint(model, cached_url_path)
    model.eval()
    model = model.to(device)
    return model


def generate_caption(image, min_length=30):
    """Given an image, return a caption."""
    image = image.convert("RGB")
    gpu_image = (
        transforms.Compose(
            [
                transforms.Resize(
                    (BLIP_EVAL_SIZE, BLIP_EVAL_SIZE),
                    interpolation=InterpolationMode.BICUBIC,
                ),
                transforms.ToTensor(),
                transforms.Normalize(
                    (0.48145466, 0.4578275, 0.40821073),
                    (0.26862954, 0.26130258, 0.27577711),
                ),
            ]
        )(image)
        .unsqueeze(0)
        .to(device)
    )

    with torch.no_grad():
        caption = blip_model().generate(
            gpu_image, sample=True, num_beams=3, max_length=80, min_length=min_length
        )
    return caption[0]
feature: generate captions for images - add wip functionality for negative masks - ci: add code linter that removes unused imports - add instructions to install rust on osx 2 years ago			`import os`
			`import os.path`
			`from functools import lru_cache`

			`import torch`
			`from torchvision import transforms`
			`from torchvision.transforms.functional import InterpolationMode`

feature: inpainting model support; improved model manager 2 years ago			`from imaginairy.model_manager import get_cached_url_path`
			`from imaginairy.utils import get_device`
feature: generate captions for images - add wip functionality for negative masks - ci: add code linter that removes unused imports - add instructions to install rust on osx 2 years ago			`from imaginairy.vendored.blip.blip import BLIP_Decoder, load_checkpoint`

			`device = get_device()`
			`if "mps" in device:`
			`device = "cpu"`

			`BLIP_EVAL_SIZE = 384`


style: speed up linting and autoformatting. fix lints 1 year ago			`@lru_cache`
feature: generate captions for images - add wip functionality for negative masks - ci: add code linter that removes unused imports - add instructions to install rust on osx 2 years ago			`def blip_model():`
style: speed up linting and autoformatting. fix lints 1 year ago			`from imaginairy.paths import PKG_ROOT`
feature: generate captions for images - add wip functionality for negative masks - ci: add code linter that removes unused imports - add instructions to install rust on osx 2 years ago
			`config_path = os.path.join(`
			`PKG_ROOT, "vendored", "blip", "configs", "med_config.json"`
			`)`
			`url = "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model*_base_caption.pth"`

			`model = BLIP_Decoder(image_size=BLIP_EVAL_SIZE, vit="base", med_config=config_path)`
			`cached_url_path = get_cached_url_path(url)`
style: speed up linting and autoformatting. fix lints 1 year ago			`model, msg = load_checkpoint(model, cached_url_path)`
feature: generate captions for images - add wip functionality for negative masks - ci: add code linter that removes unused imports - add instructions to install rust on osx 2 years ago			`model.eval()`
			`model = model.to(device)`
			`return model`


feature: prompt expansion (#51) You can use `{}` to randomly pull values from lists. A list of values separated by `\|` and enclosed in `{ }` will be randomly drawn from in a non-repeating fashion. Values that are surrounded by `_ _` will pull from a phrase list of the same name. Folders containing .txt phraselist files may be specified via `--prompt_library_path`. The option may be specified multiple times. Built-in categories: 3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-movement, art-site, artist, artist-botanical, artist-surreal, aspect-ratio, bird, body-of-water, body-pose, camera-brand, camera-model, color, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, dinosaur, eyecolor, f-stop, fantasy-creature, fantasy-setting, fish, flower, focal-length, food, fruit, games, gen-modifier, hair, hd, iso-stop, landscape-type, national-park, nationality, neg-weight, noun-beauty, noun-fantasy, noun-general, noun-horror, occupation, photo-term, pop-culture, pop-location, punk-style, quantity, rpg-item, scenario-desc, skin-color, spaceship, style, tree-species, trippy, world-heritage-site Examples: `imagine "a {red\|black} dog" -r 2 --seed 0` will generate both "a red dog" and "a black dog" `imagine "a {_color_} dog" -r 4 --seed 0` will generate four, different colored dogs. The colors will eb pulled from an included phraselist of colors. `imagine "a {_spaceship_\|_fruit_\|hot air balloon}. low-poly" -r 4 --seed 0` will generate images of spaceships or fruits or a hot air balloon Credit to [noodle-soup-prompts](https://github.com/WASasquatch/noodle-soup-prompts/) where most, but not all, of the wordlists originate. 2 years ago			`def generate_caption(image, min_length=30):`
lint: new ruff linter 2 years ago			`"""Given an image, return a caption."""`
fix: add png handling for LazyLoadingImage (#203) Loading png images for captioning will cause the following error since PIL loads png images in RGBA mode. ```python File "./miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional_tensor.py", line 940, in normalize return tensor.sub_(mean).div_(std) RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0 ``` 2 years ago			`image = image.convert("RGB")`
feature: generate captions for images - add wip functionality for negative masks - ci: add code linter that removes unused imports - add instructions to install rust on osx 2 years ago			`gpu_image = (`
			`transforms.Compose(`
			`[`
			`transforms.Resize(`
			`(BLIP_EVAL_SIZE, BLIP_EVAL_SIZE),`
			`interpolation=InterpolationMode.BICUBIC,`
			`),`
			`transforms.ToTensor(),`
			`transforms.Normalize(`
			`(0.48145466, 0.4578275, 0.40821073),`
			`(0.26862954, 0.26130258, 0.27577711),`
			`),`
			`]`
			`)(image)`
			`.unsqueeze(0)`
			`.to(device)`
			`)`

			`with torch.no_grad():`
			`caption = blip_model().generate(`
fix: changed sample to True to generate caption using blip model 1 year ago			`gpu_image, sample=True, num_beams=3, max_length=80, min_length=min_length`
feature: generate captions for images - add wip functionality for negative masks - ci: add code linter that removes unused imports - add instructions to install rust on osx 2 years ago			`)`
			`return caption[0]`