feature: dilation and erosion of masks

Previously the `+` and `-` characters in a mask (example: `face{+0.1}`) added to the grayscale value of any masked areas. This wasn't very useful. The new behavior is that the mask will expand or contract by the number of pixel specified. The technical terms for this are dilation and erosion. This allows much greater control over the masked area.
2 years ago · 8332593fed
parent 6f1455e912
commit 8332593fed
19 changed files with 49 additions and 28 deletions
--- a/README.md
+++ b/README.md
@ -36,9 +36,19 @@ Generating 🖼  : "portrait photo of a freckled woman" 512x512px seed:500686645
 <img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000056_293284644_PLMS40_PS7.5_photo_of_a_bowl_of_fruit.jpg" height="256"><img src="https://raw.githubusercontent.com/brycedrennan/imaginAIry/master/assets/000078_260972468_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg"  height="256">

 ### Prompt Based Editing  [by clipseg](https://github.com/timojl/clipseg)
-Specify advanced text based masks using boolean logic and strength modifiers. Mask descriptions must be lowercase. Keywords uppercase.
-Valid symbols: `AND`, `OR`, `NOT`, `()`, and mask strength modifier `{*1.5}` where `+` can be any of `+ - * /`. Single-character boolean 
-operators also work.  When writing strength modifies know that pixel values are between 0 and 1.
+Specify advanced text based masks using boolean logic and strength modifiers. 
+Mask syntax:
+  - mask descriptions must be lowercase
+  - keywords (`AND`, `OR`, `NOT`) must be uppercase
+  - parentheses are supported 
+  - mask modifiers may be appended to any mask or group of masks.  Example: `(dog OR cat){+5}` means that we'll
+select any dog or cat and then expand the size of the mask area by 5 pixels.  Valid mask modifiers:
+    - `{+n}` - expand mask by n pixels
+    - `{-n}` - shrink mask by n pixels
+    - `{*n}` - multiply mask strength. will expand mask to areas that weakly matched the mask description
+    - `{/n}` - divide mask strength. will reduce mask to areas that most strongly matched the mask description. probably not useful
+
+When writing strength modifiers keep in mind that pixel values are between 0 and 1.

 ```bash
 >> imagine \
@ -213,7 +223,8 @@ docker run -it --gpus all -v $HOME/.cache/huggingface:/root/.cache/huggingface -
 [Example Colab](https://colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjCOHzGVDgZkaTtO?usp=sharing)

 ## ChangeLog
-
+ - feature: dilation and erosion of masks
+ Previously the `+` and `-` characters in a mask (example: `face{+0.1}`) added to the grayscale value of any masked areas. This wasn't very useful. The new behavior is that the mask will expand or contract by the number of pixel specified. The technical terms for this are dilation and erosion.  This allows much greater control over the masked area.
 - feature: update k-diffusion samplers. add k_dpm_adaptive and k_dpm_fast

 **3.1.0**
@ -359,6 +370,9 @@ would be uncorrelated to the rest of the surrounding image.  It created terrible
   - ✅ text based image masking
     - ✅ ClipSeg - https://github.com/timojl/clipseg
     - https://github.com/facebookresearch/detectron2
+   - Attention Control Methods
+     - https://github.com/bloc97/CrossAttentionControl
+     - https://github.com/ChenWu98/cycle-diffusion
 - Image Enhancement
   - Photo Restoration - https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
   - Upscaling
@ -392,8 +406,6 @@ would be uncorrelated to the rest of the surrounding image.  It created terrible
     - https://github.com/francislabountyjr/stable-diffusion/blob/main/inferencing_notebook.ipynb
     - https://www.youtube.com/watch?v=E7aAFEhdngI
     - https://github.com/pytti-tools/frame-interpolation
-   - cross-attention control: 
-     - https://github.com/bloc97/CrossAttentionControl/blob/main/CrossAttention_Release_NoImages.ipynb
   - guided generation 
     - https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1#scrollTo=UDeXQKbPTdZI
     - https://colab.research.google.com/github/aicrumb/doohickey/blob/main/Doohickey_Diffusion.ipynb#scrollTo=PytCwKXCmPid
--- a/imaginairy/enhancers/bool_masker.py
+++ b/imaginairy/enhancers/bool_masker.py
@ -21,6 +21,7 @@ from abc import ABC

 import pyparsing as pp
 import torch
+from kornia.morphology import dilation, erosion
 from pyparsing import ParserElement

 ParserElement.enablePackrat()
@ -70,7 +71,8 @@ class ModifiedMask(Mask):
            modifier = modifier.strip("{}")
        self.mask = mask
        self.modifier = modifier
-        self.operand = self.ops[modifier[0]]
+        self.operand_str = modifier[0]
+        self.operand = self.ops[self.operand_str]
        self.value = float(modifier[1:])

    @classmethod
@ -85,6 +87,15 @@ class ModifiedMask(Mask):

    def apply_masks(self, mask_cache):
        mask = self.mask.apply_masks(mask_cache)
+        if self.operand_str in {"+", "-"}:
+            # kernel must be odd
+            kernel_size = int(round(self.value))
+            kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1
+            morph_method = dilation if self.operand_str == "+" else erosion
+            mask = mask.unsqueeze_(0).unsqueeze_(0)
+            mask = morph_method(mask, torch.ones(kernel_size, kernel_size))
+            mask = mask.squeeze()
+            return mask
        return torch.clamp(self.operand(mask, self.value), 0, 1)


--- a/tests/expected_output/test_clip_masking_.png
+++ b/tests/expected_output/test_clip_masking_.png
--- a/tests/expected_output/test_clip_masking__mask*0.5_bin.png
+++ b/tests/expected_output/test_clip_masking__mask*0.5_bin.png
--- a/tests/expected_output/test_clip_masking__mask*0.5_g.png
+++ b/tests/expected_output/test_clip_masking__mask*0.5_g.png
--- a/tests/expected_output/test_clip_masking__mask*1_bin.png
+++ b/tests/expected_output/test_clip_masking__mask*1_bin.png
--- a/tests/expected_output/test_clip_masking__mask*1_g.png
+++ b/tests/expected_output/test_clip_masking__mask*1_g.png
--- a/tests/expected_output/test_clip_masking__mask*6_bin.png
+++ b/tests/expected_output/test_clip_masking__mask*6_bin.png
--- a/tests/expected_output/test_clip_masking__mask*6_g.png
+++ b/tests/expected_output/test_clip_masking__mask*6_g.png
--- a/tests/expected_output/test_clip_masking__mask+101_bin.png
+++ b/tests/expected_output/test_clip_masking__mask+101_bin.png
--- a/tests/expected_output/test_clip_masking__mask+101_g.png
+++ b/tests/expected_output/test_clip_masking__mask+101_g.png
--- a/tests/expected_output/test_clip_masking__mask+11_bin.png
+++ b/tests/expected_output/test_clip_masking__mask+11_bin.png
--- a/tests/expected_output/test_clip_masking__mask+11_g.png
+++ b/tests/expected_output/test_clip_masking__mask+11_g.png
--- a/tests/expected_output/test_clip_masking__mask+1_bin.png
+++ b/tests/expected_output/test_clip_masking__mask+1_bin.png
--- a/tests/expected_output/test_clip_masking__mask+1_g.png
+++ b/tests/expected_output/test_clip_masking__mask+1_g.png
--- a/tests/expected_output/test_clip_masking__mask-25_bin.png
+++ b/tests/expected_output/test_clip_masking__mask-25_bin.png
--- a/tests/expected_output/test_clip_masking__mask-25_g.png
+++ b/tests/expected_output/test_clip_masking__mask-25_g.png
--- a/tests/test_api.py
+++ b/tests/test_api.py
@ -19,13 +19,13 @@ def test_imagine(sampler_type, filename_base_for_outputs):
    )
    result = next(imagine(prompt))

-    threshold_lookup = {
-        "k_dpm_2_a": 26000
-    }
+    threshold_lookup = {"k_dpm_2_a": 26000}
    threshold = threshold_lookup.get(sampler_type, 10000)

    img_path = f"{filename_base_for_outputs}.png"
-    assert_image_similar_to_expectation(result.img, img_path=img_path, threshold=threshold)
+    assert_image_similar_to_expectation(
+        result.img, img_path=img_path, threshold=threshold
+    )


 def test_img2img_beach_to_sunset(
@ -115,13 +115,16 @@ def test_img_to_img_fruit_2_gold(
    result = next(imagine(prompt))

    threshold_lookup = {
-        "k_dpm_2_a": 26000
+        "k_dpm_2_a": 26000,
+        "k_dpm_adaptive": 11000,
    }
    threshold = threshold_lookup.get(sampler_type, 10000)

    pillow_fit_image_within(img).save(f"{filename_base_for_orig_outputs}__orig.jpg")
    img_path = f"{filename_base_for_outputs}.png"
-    assert_image_similar_to_expectation(result.img, img_path=img_path, threshold=threshold)
+    assert_image_similar_to_expectation(
+        result.img, img_path=img_path, threshold=threshold
+    )


@pytest.mark.skipif(get_device() == "cpu", reason="Too slow to run on CPU")
--- a/tests/test_enhancers.py
+++ b/tests/test_enhancers.py
@ -27,26 +27,23 @@ def test_fix_faces(filename_base_for_orig_outputs, filename_base_for_outputs):


@pytest.mark.skipif(get_device() == "cpu", reason="Too slow to run on CPU")
-def test_clip_masking():
+def test_clip_masking(filename_base_for_outputs):
    img = Image.open(f"{TESTS_FOLDER}/data/girl_with_a_pearl_earring_large.jpg")

-    for mask_modifier in [
-        "*0.5",
-        "*1",
-        "*6",
-    ]:
+    for mask_modifier in ["*0.5", "*6", "+1", "+11", "+101", "-25"]:
        pred_bin, pred_grayscale = get_img_mask(
            img,
            f"face AND NOT (bandana OR hair OR blue fabric){{{mask_modifier}}}",
            threshold=0.5,
        )
-        pred_grayscale.save(
-            f"{TESTS_FOLDER}/test_output/earring_mask_{mask_modifier}_g.png"
-        )
-        pred_bin.save(
-            f"{TESTS_FOLDER}/test_output/earring_mask_{mask_modifier}_bin.png"
+        img_path = f"{filename_base_for_outputs}_mask{mask_modifier}_g.png"
+        assert_image_similar_to_expectation(
+            pred_grayscale, img_path=img_path, threshold=0
        )

+        img_path = f"{filename_base_for_outputs}_mask{mask_modifier}_bin.png"
+        assert_image_similar_to_expectation(pred_bin, img_path=img_path, threshold=10)
+
    prompt = ImaginePrompt(
        "",
        init_image=img,
@ -60,10 +57,8 @@ def test_clip_masking():
    )

    result = next(imagine(prompt))
-    result.save(
-        f"{TESTS_FOLDER}/test_output/earring_mask_photo.png",
-        image_type="generated",
-    )
+    img_path = f"{filename_base_for_outputs}.png"
+    assert_image_similar_to_expectation(result.img, img_path=img_path, threshold=10)


 boolean_mask_test_cases = [