infer_segment_anything_2

About

1.1.0

Apache-2.0

Inference for Segment Anything Model 2 (SAM2).

Task: Semantic segmentation

SAM

ViT

Zero-Shot

SA-V dataset

🚀 Use with Ikomia API

1. Install Ikomia API

We strongly recommend using a virtual environment. If you're not sure where to start, we offer a tutorial here.

pip install ikomia

2. Create your workflow

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()

# Add algorithm
algo  = wf.add_task(name="infer_segment_anything_2", auto_connect=True)

# Run directly on your image
wf.run_on(url="https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/images/cars.jpg?raw=true")

# Inspect your result
display(algo.get_image_with_mask())

☀️ Use with Ikomia Studio

Ikomia Studio offers a friendly UI with the same features as the API.

If you haven't started using Ikomia Studio yet, download and install it from this page.
For additional guidance on getting started with Ikomia Studio, check out this blog post.

📝 Set algorithm parameters

General parameters

model_name [str] : The SAM model can be loaded with four different encoders:
- sam2_hiera_tiny - size: 38.9
- sam2_hiera_small - size: 46
- sam2_hiera_base_plus - size: 80.8
- sam2_hiera_large - size: 224.4
cuda [bool]: If True, CUDA-based inference (GPU). If False, run on CPU.
input_size_percent [int] : Percentage size of the input image. Can be reduce to save memory usage.
apply_postprocessing [Bool] : Whether or not to apply mask output image post processing.

Prompt predictor parameters

input_box [list]: A Nx4 array of given box prompts to the model, in [[XYXY]] or [[XYXY], [XYXY]] format.
input_point [list]: A Nx2 array of point prompts to the model. Each point is in [[X,Y]] or [[XY], [XY]] in pixels.
input_point_label [list]: A length N array of labels for the point prompts. 1 indicates a foreground point and 0 indicates a background point
multimask_output - [bool]: if true, the model will return three masks. If only a single mask is needed, the model's predicted quality score can be used to select the best mask. For non-ambiguous prompts, such as multiple input prompts, multimask_output=False can give better results.

Automatic predictor parameters

points_per_side [int or None] - the number of points to be sampled along one side of the image. The total number of points is
points_per_batch - [int] - sets the number of points run simultaneously by the model. Higher numbers may be faster but use more GPU memory.
iou_thresh [float] - a filtering threshold in [0,1], using the model's predicted mask quality.
stability_score_thresh - [float] - a filtering threshold in [0,1], using the stability of the mask under changes to the cutoff used to binarize the model's mask predictions.
stability_score_offset - [float] - the amount to shift the cutoff when calculated the stability score.
box_nms_thresh - [float] - the box IoU cutoff used by non-maximal suppression to filter duplicate masks.
crop_n_layers - [int] - if >0, mask prediction will be run again on crops of the image. Sets the number of layers to run, where each layer has 2i_layer` number of image crops.
crop_nms_thresh - [float] - the box IoU cutoff used by non-maximal suppression to filter duplicate masks between different crops.
crop_overlap_ratio - [float] - sets the degree to which crops overlap. In the first crop layer, crops will overlap by this fraction of the image length. Later layers with more crops scale down this overlap.
crop_n_points_downscale_factor - [int] - the number of points-per-side sampled in layer n is scaled down by crop_n_points_downscale_factorn`.
use_m2m - [bool]: Whether to add a one step refinement using previous mask predictions.

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()

# Add algorithm
algo  = wf.add_task(name="infer_segment_anything_2", auto_connect=True)

# Example: Setting parameters for 'prompt prediction':
algo.set_parameters({
    # "points_per_side": "32", # For 'automatic predictor'
    "input_size_percent": "100",
    "input_point":"[[500,500]]",
    "multimask_output":"True" # Generate 3 output masks
})

# Run directly on your image
wf.run_on(url="https://github.com/Ikomia-dev/notebooks/blob/main/examples/img/img_cat.jpg?raw=true")

# Inspect your result
img_output = algo.get_output(0)
mask_output = algo.get_output(1)
mask_output_2 = algo.get_output(2)
mask_output_3 = algo.get_output(3)
# display(img_output.get_image_with_mask(mask_output), title="Mask 1")
display(img_output.get_image_with_mask(mask_output_2), title="Mask 2")
# display(img_output.get_image_with_mask(mask_output_3), title="Mask 3")

🔍 Explore algorithm outputs

Every algorithm produces specific outputs, yet they can be explored them the same way using the Ikomia API. For a more in-depth understanding of managing algorithm outputs, please refer to the documentation.

from ikomia.dataprocess.workflow import Workflow

# Init your workflow
wf = Workflow()

# Add algorithm
algo  = wf.add_task(name="infer_segment_anything_2", auto_connect=True)

# Run directly on your image
wf.run_on(url="https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/images/cars.jpg?raw=true")

# Iterate over outputs
for output in algo.get_outputs():
    # Print information
    print(output)
    # Export it to JSON
    output.to_json()

⏩ Advanced usage

1. Automated mask generation

When no prompt is used, SAM2 will generate masks automatically over the entire image. You can select the number of masks using the parameter "Points per side" on Ikomia STUDIO or "points_per_side" with the API.

Sam dog auto

2. Segmentation mask with graphic prompts:

Given a graphic prompts: a single point or boxes SAM can predict masks over the desired objects.

Ikomia API:
- Using prompt coordinate
  - Point: 'input_point' parameter, e.g. [xy]
  - Point label: 'input_point_label' parameters, e.g. [1,0] 1 to include, 0 to exclude from mask
  - Box: 'input_box' parameter, e,g, [xyxy] or [[xyxy], [xyxy]].
Ikomia STUDIO:
- Using graphics
  - Point: Select the point tool
  - Box: Select the Square/Rectangle tool
- Using coordinate prompts
  - Point: 'Point coord. xy (optional)' [xy]
  - Point label: [1,0], 1 to include, 0 to exclude from mask
  - Box: 'Box coord. xyxy (optional)' [[xyxy], [xyxy]]

a. Single point

SAM2 with "multimask_output":"True" generate three outputs given a single point (3 best scores). Sam dog single

b. Multiple points

A single point can be ambiguous, using multiple points can improve the quality of the expected mask.

c. Boxes

Drawing a box over the desired object usually output a mask closer to expectation compared to point(s).

SAM can also take multiple inputs prompts. Sam cat boxes

d. Point and box

Point and box can be combined by including both types of prompts to the predictor. Here this can be used to select just the trucks's tire, instead of the entire wheel. truck_box_point

Developer

Ikomia

License

Apache License 2.0

Read license full text

A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

Permissions	Conditions	Limitations
Commercial use	License and copyright notice	Trademark use
Modification	State changes	Liability
Distribution		Warranty
Patent use
Private use

This is not legal advice: this description is for informational purposes only and does not constitute the license itself. Provided by choosealicense.com.