infer_florence_2_caption
About
Image captioning with Florence-2
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. In this algorithm you can use Florence-2 for image captioning.
Output: 'The image shows a young man sitting at a wooden table in a room with a large window in the background. He is wearing a white long-sleeved shirt and has a beard and dreadlocks. On the table, there is a laptop, a cup of coffee, and a small plant. A dog is lying on the floor next to the table. The room is decorated with potted plants and there is an air conditioning unit on the wall. The overall atmosphere of the room is cozy and relaxed.'
🚀 Use with Ikomia API
1. Install Ikomia API
We strongly recommend using a virtual environment. If you're not sure where to start, we offer a tutorial here.
pip install ikomia
2. Create your workflow
from ikomia.dataprocess.workflow import Workflow# Init your workflowwf = Workflow()# Add algorithmalgo = wf.add_task(name="infer_florence_2_caption", auto_connect=True)# Run on your imagewf.run_on(url="https://images.pexels.com/photos/5749076/pexels-photo-5749076.jpeg?cs=srgb&dl=pexels-zen-chung-5749076.jpg&fm=jpg&w=640&h=960")# Save output .jsoncaption_output = algo.get_output(0)caption_output.save('caption_output.json')
☀️ Use with Ikomia Studio
Ikomia Studio offers a friendly UI with the same features as the API.
- If you haven't started using Ikomia Studio yet, download and install it from this page.
- For additional guidance on getting started with Ikomia Studio, check out this blog post.
📝 Set algorithm parameters
- model_name (str) - default 'microsoft/Florence-2-base': Name of the Florence-2 pre-trained model. Other models available:
- microsoft/Florence-2-large
- microsoft/Florence-2-base-ft
- microsoft/Florence-2-large-ft
- task_prompt (str) - default 'MORE_DETAILED_CAPTION': Level of detail of the captioning. Other levels available:
- CAPTION
- DETAILED_CAPTION
- num_beams (int) - default '3': By specifying a number of beams higher than 1, you are effectively switching from greedy search to beam search. This strategy evaluates several hypotheses at each time step and eventually chooses the hypothesis that has the overall highest probability for the entire sequence. This has the advantage of identifying high-probability sequences that start with a lower probability initial tokens and would’ve been ignored by the greedy search.
- do_sample (bool) - default 'False': If set to True, this parameter enables decoding strategies such as multinomial sampling, beam-search multinomial sampling, Top-K sampling and Top-p sampling. All these strategies select the next token from the probability distribution over the entire vocabulary with various strategy-specific adjustments.
- early_stopping (bool) - default 'False': Controls the stopping condition for beam-based methods, like beam-search. It accepts the following values: True, where the generation stops as soon as there are num_beams complete candidates; False, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; "never", where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm).
- cuda (bool): If True, CUDA-based inference (GPU). If False, run on CPU. Optionally, you can load a custom model:
Parameters should be in strings format when added to the dictionary.
from ikomia.dataprocess.workflow import Workflow# Init your workflowwf = Workflow()# Add algorithmalgo = wf.add_task(name="infer_florence_2_caption", auto_connect=True)algo.set_parameters({"model_name":"microsoft/Florence-2-large","task_prompt":"MORE_DETAILED_CAPTION","max_new_tokens":"1024","num_beams":"3","do_sample":"False","early_stopping":"False","cuda":"True"})# Run on your imagewf.run_on(url="https://images.pexels.com/photos/5749076/pexels-photo-5749076.jpeg?cs=srgb&dl=pexels-zen-chung-5749076.jpg&fm=jpg&w=640&h=960")# Save output .jsoncaption_output = algo.get_output(0)caption_output.save('caption_output.json')
🔍 Explore algorithm outputs
Every algorithm produces specific outputs, yet they can be explored them the same way using the Ikomia API. For a more in-depth understanding of managing algorithm outputs, please refer to the documentation.
import ikomiafrom ikomia.dataprocess.workflow import Workflow# Init your workflowwf = Workflow()# Add algorithmalgo = wf.add_task(name="infer_florence_2_caption", auto_connect=True)# Run on your imagewf.run_on(url="https://images.pexels.com/photos/5749076/pexels-photo-5749076.jpeg?cs=srgb&dl=pexels-zen-chung-5749076.jpg&fm=jpg&w=640&h=960")# Iterate over outputsfor output in algo.get_outputs():# Print informationprint(output)# Export it to JSONoutput.to_json()
Developer
Ikomia
License
MIT License
A short and simple permissive license with conditions only requiring preservation of copyright and license notices. Licensed works, modifications, and larger works may be distributed under different terms and without source code.
Permissions | Conditions | Limitations |
---|---|---|
Commercial use | License and copyright notice | Liability |
Modification | Warranty | |
Distribution | ||
Private use |
This is not legal advice: this description is for informational purposes only and does not constitute the license itself. Provided by choosealicense.com.