Auto-annotate images with GroundingDINO and SAM models

Auto-annotate images using a text prompt. GroundingDINO is employed for object detection (bounding boxes), followed by MobileSAM or SAM for segmentation. The annotations are then saved in both Pascal VOC format and COCO format.

The COCO annotation file (.json) is compatible with the Ikomia dataset_coco dataloader.

label illustration

🚀 Use with Ikomia API

1. Install Ikomia API

We strongly recommend using a virtual environment. If you're not sure where to start, we offer a tutorial here.

pip install ikomia

2. Create your workflow

  • classes (str) - default 'car, person, dog, chair' : list of classes (string) or path to file.txt (see template utils/classes_list_template.txt).
  • task (str) - default 'object detection': 'object detection' or 'segmentation'.
  • dataset_split_ratio (float) - default '0.8': Image split between train and test coco annotations.
  • model_name_grounding_dino (str) - default 'Swin-B': 'Swin-T' or 'Swin-B'.
  • model_name_sam (str) - default 'mobile_sam': 'mobile_sam', 'vit_b', 'vit_l' or 'vit_h'.
  • conf_thres (float) - default '0.35': Box confidence threshold of the GroundingDINO model.
  • conf_thres_text (float) - default '0.25': Text confidence threshold of the GroundingDINO model.
  • min_relative_object_size (float) - default '0.002': The minimum percentage of detection area relative to the image area for a detection to be included.
  • max_relative_object_size (float) - default '0.8': The maximum percentage of detection area relative to the image area for a detection to be included.
  • polygon_simplification_factor (float) - default '0.8': The percentage of polygon points to be removed from the input polygon, in the range [0, 1[.
  • image_folder (str): Path of your image folder.
  • output_folder (str): Path of the output annotation file.
  • output_dataset_name (str) - optional: Name of the output folder. By default it will be the timestamp.
  • export_coco (bool) - default 'True': Save annotation in COCO format.
  • export_pascal_voc (bool) - default 'False': Save annotation in Pascal VOC format.

Parameters should be in strings format when added to the dictionary.

The code snippet below requires 6Gb of GPU memory

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()

# Add the auto_annotate process to the workflow and set parameters
annotate = wf.add_task(name = "auto_annotate")

    "image_folder": "Path/To/Your/Image/Folder",
    "classes": "car, person, dog, chair",
    "task": "segmentation",
    "dataset_split_ratio": "0.8",
    "model_name_grounding_dino": "Swin-T",
    "model_name_sam": "mobile_sam",
    "conf_thres": "0.35",
    "conf_thres_text": "0.25",
    "min_relative_object_size": "0.80",
    "output_folder": "Path/To/Annotations/Output/Folder"

# Run auto_annotate

☀️ Use with Ikomia Studio

Ikomia Studio offers a friendly UI with the same features as the API.

  • If you haven't started using Ikomia Studio yet, download and install it from this page.

  • For additional guidance on getting started with Ikomia Studio, check out this blog post.

✒️ Citation

  title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},
  author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
  journal={arXiv preprint arXiv:2303.05499},
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
  author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
  journal={arXiv preprint arXiv:2306.14289},


  • Ikomia


Apache License 2.0
Read license full text

A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.


Commercial use

License and copyright notice

Trademark use


State changes




Patent use

Private use

This is not legal advice: this description is for informational purposes only and does not constitute the license itself. Provided by