Image
Captioning
AutoModelForImageCaptioning
Bases: BaseImageCaptioning
AutoModelForImageCaptioning pipeline supporting different combinations of annotation and validation models OpenAI, Gemini, and Qwen2-VL.
Example Usage:
from swiftannotate.image import AutoModelForImageCaptioning
# Initialize the pipeline
# Note: You can use either Qwen2VL, OpenAI, and Gemini for captioning and validation.
captioner = AutoModelForImageCaptioning(
caption_model="gpt-4o",
validation_model="gemini-1.5-flash",
caption_api_key="your_openai_api_key",
validation_api_key="your_gemini_api_key",
output_file="captions.json"
)
# Generate captions for a list of images
image_paths = ["path/to/image1.jpg"]
results = captioner.generate(image_paths)
# Print results
# Output: [
# {
# 'image_path': 'path/to/image1.jpg',
# 'image_caption': 'A cat sitting on a table.',
# 'validation_reasoning': 'The caption is valid.',
# 'validation_score': 0.8
# },
# ]
Source code in swiftannotate/image/captioning/auto.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | |
__init__(caption_model, validation_model, caption_model_processor=None, validation_model_processor=None, caption_api_key=None, validation_api_key=None, caption_prompt=BASE_IMAGE_CAPTION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CAPTION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None, **kwargs)
Initialize the AutoModelForImageCaptioning class. This class provides functionality for automatic image captioning with optional validation. It supports different combinations of annotation and validation models like OpenAI, Gemini, and Qwen2-VL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
caption_model
|
Union[str, Qwen2VLForConditionalGeneration]
|
Model or API endpoint for caption generation. Can be either a local model instance or API endpoint string. |
required |
validation_model
|
Union[str, Qwen2VLForConditionalGeneration]
|
Model or API endpoint for caption validation. Can be either a local model instance or API endpoint string. |
required |
caption_model_processor
|
Optional[Qwen2VLProcessor]
|
Processor for caption model. Required if using a local model for captioning. |
None
|
validation_model_processor
|
Optional[Qwen2VLProcessor]
|
Processor for validation model. Required if using a local model for validation. |
None
|
caption_api_key
|
Optional[str]
|
API key for caption service if using API endpoint. |
None
|
validation_api_key
|
Optional[str]
|
API key for validation service if using API endpoint. |
None
|
caption_prompt
|
str
|
Prompt template for caption generation. Defaults to BASE_IMAGE_CAPTION_PROMPT. |
BASE_IMAGE_CAPTION_PROMPT
|
validation
|
bool
|
Whether to perform validation on generated captions. Defaults to True. |
True
|
validation_prompt
|
str
|
Prompt template for caption validation. Defaults to BASE_IMAGE_CAPTION_VALIDATION_PROMPT. |
BASE_IMAGE_CAPTION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold score for caption validation. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Maximum number of retry attempts for failed validation. Defaults to 3. |
3
|
output_file
|
Optional[str]
|
Path to save results. If None, results are not saved. |
None
|
**kwargs
|
Additional arguments passed to model initialization. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If required model processors are not provided for local models. |
ValueError
|
If an unsupported model is provided. |
Note
At least one of caption_model_processor or caption_api_key must be provided for caption generation. Same applies for validation_model_processor or validation_api_key if validation is enabled.
Source code in swiftannotate/image/captioning/auto.py
annotate(image, feedback_prompt, **kwargs)
Annotates the image with a caption. Implements the logic to generate captions for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the caption does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better caption. Defaults to ''. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated caption for the image. |
Source code in swiftannotate/image/captioning/auto.py
generate(image_paths, **kwargs)
Generates captions for a list of images. Implements the logic to generate captions for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate captions for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
|
List[Dict]: List of captions, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/captioning/auto.py
validate(image, caption, **kwargs)
Validates the caption generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
caption
|
str
|
Caption generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[bool, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the caption. |
Source code in swiftannotate/image/captioning/auto.py
GeminiForImageCaptioning
Bases: BaseImageCaptioning
GeminiForImageClassification pipeline for generating captions for images using Gemini models.
Example usage:
from swiftannotate.image import GeminiForImageCaptioning
# Initialize the pipeline
captioner = GeminiForImageCaptioning(
caption_model="gemini-1.5-pro",
validation_model="gemini-1.5-flash",
api_key="your_api_key_here",
output_file="captions.json"
)
# Generate captions for a list of images
image_paths = ["path/to/image1.jpg"]
results = captioner.generate(image_paths)
# Print results
# Output: [
# {
# 'image_path': 'path/to/image1.jpg',
# 'image_caption': 'A cat sitting on a table.',
# 'validation_reasoning': 'The caption is valid.',
# 'validation_score': 0.8
# },
# ]
Source code in swiftannotate/image/captioning/gemini.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
__init__(caption_model, validation_model, api_key, caption_prompt=BASE_IMAGE_CAPTION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CAPTION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None)
Initializes the ImageCaptioningGemini pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
caption_model
|
str
|
Can be either "gemini-1.5-flash", "gemini-1.5-pro", etc. or specific versions of model supported by Gemini. |
required |
validation_model
|
str
|
Can be either "gemini-1.5-flash", "gemini-1.5-pro", etc. or specific versions of model supported by Gemini. |
required |
api_key
|
str
|
Google Gemini API key. |
required |
caption_prompt
|
str | None
|
System prompt for captioning images. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image captions should specify the range of validation score to be generated. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image caption is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image caption. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Notes
validation_prompt should specify the rules for validating the caption and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
Source code in swiftannotate/image/captioning/gemini.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a caption. Implements the logic to generate captions for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the caption does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better caption. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated caption for the image. |
Source code in swiftannotate/image/captioning/gemini.py
generate(image_paths, **kwargs)
Generates captions for a list of images. Implements the logic to generate captions for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate captions for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of captions, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/captioning/gemini.py
validate(image, caption, **kwargs)
Validates the caption generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
caption
|
str
|
Caption generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the caption. |
Source code in swiftannotate/image/captioning/gemini.py
OllamaForImageCaptioning
Bases: BaseImageCaptioning
OllamaForImageCaptioning pipeline using Ollama API.
Example usage:
from swiftannotate.image import OllamaForImageCaptioning
# Initialize the pipeline
captioner = OllamaForImageCaptioning(
caption_model="llama3.2-vision",
validation_model="llama3.2-vision",
output_file="captions.json"
)
# Generate captions for a list of images
image_paths = ["path/to/image1.jpg"]
results = captioner.generate(image_paths)
# Print results
# Output: [
# {
# 'image_path': 'path/to/image1.jpg',
# 'image_caption': 'A cat sitting on a table.',
# 'validation_reasoning': 'The caption is valid.',
# 'validation_score': 0.8
# },
# ]
Source code in swiftannotate/image/captioning/ollama.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
__init__(caption_model, validation_model, caption_prompt=BASE_IMAGE_CAPTION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CAPTION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None)
Initializes the OllamaForImageCaptioning pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
caption_model
|
str
|
Can be either any of the Multimodal (Vision) models supported by Ollama. specific versions of model supported by Ollama. |
required |
validation_model
|
str
|
Can be either any of the Multimodal (Vision) models supported by Ollama. specific versions of model supported by Ollama. |
required |
caption_prompt
|
str | None
|
System prompt for captioning images. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image captions should specify the range of validation score to be generated. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image caption is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image caption. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Notes
validation_prompt should specify the rules for validating the caption and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
Source code in swiftannotate/image/captioning/ollama.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a caption. Implements the logic to generate captions for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the caption does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better caption. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated caption for the image. |
Source code in swiftannotate/image/captioning/ollama.py
generate(image_paths, **kwargs)
Generates captions for a list of images. Implements the logic to generate captions for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate captions for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of captions, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/captioning/ollama.py
validate(image, caption, **kwargs)
Validates the caption generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
caption
|
str
|
Caption generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the caption. |
Source code in swiftannotate/image/captioning/ollama.py
OpenAIForImageCaptioning
Bases: BaseImageCaptioning
OpenAIForImageCaptioning pipeline using OpenAI API.
Example usage:
from swiftannotate.image import OpenAIForImageCaptioning
# Initialize the pipeline
captioner = OpenAIForImageCaptioning(
caption_model="gpt-4o",
validation_model="gpt-4o-mini",
api_key="your_api_key_here",
output_file="captions.json"
)
# Generate captions for a list of images
image_paths = ["path/to/image1.jpg"]
results = captioner.generate(image_paths)
# Print results
# Output: [
# {
# 'image_path': 'path/to/image1.jpg',
# 'image_caption': 'A cat sitting on a table.',
# 'validation_reasoning': 'The caption is valid.',
# 'validation_score': 0.8
# },
# ]
Source code in swiftannotate/image/captioning/openai.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | |
__init__(caption_model, validation_model, api_key, caption_prompt=BASE_IMAGE_CAPTION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CAPTION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None, **kwargs)
Initializes the ImageCaptioningOpenAI pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
caption_model
|
str
|
Can be either "gpt-4o", "gpt-4o-mini", etc. or specific versions of model supported by OpenAI. |
required |
validation_model
|
str
|
Can be either "gpt-4o", "gpt-4o-mini", etc. or specific versions of model supported by OpenAI. |
required |
api_key
|
str
|
OpenAI API key. |
required |
caption_prompt
|
str | None
|
System prompt for captioning images. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image captions should specify the range of validation score to be generated. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image caption is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image caption. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Other Parameters:
| Name | Type | Description |
|---|---|---|
detail |
str
|
Specific to OpenAI. Detail level of the image (Higher resolution costs more). Defaults to "low". |
Notes
validation_prompt should specify the rules for validating the caption and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
Source code in swiftannotate/image/captioning/openai.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a caption. Implements the logic to generate captions for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the caption does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better caption. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated caption for the image. |
Source code in swiftannotate/image/captioning/openai.py
generate(image_paths, **kwargs)
Generates captions for a list of images. Implements the logic to generate captions for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate captions for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of captions, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/captioning/openai.py
validate(image, caption, **kwargs)
Validates the caption generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
caption
|
str
|
Caption generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the caption. |
Source code in swiftannotate/image/captioning/openai.py
Qwen2VLForImageCaptioning
Bases: BaseImageCaptioning
Qwen2VLForImageCaptioning pipeline using Qwen2VL model.
Example usage:
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers import BitsAndBytesConfig
from swiftannotate.image import Qwen2VLForImageCaptioning
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
model = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
device_map="auto",
torch_dtype="auto",
quantization_config=quantization_config)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
# Load the Caption Model
captioning_pipeline = Qwen2VLForImageCaptioning(
model = model,
processor = processor,
output_file="captions.json"
)
# Generate captions for images
image_paths = ['path/to/image1.jpg']
results = captioning_pipeline.generate(image_paths)
# Print results
# Output: [
# {
# 'image_path': 'path/to/image1.jpg',
# 'image_caption': 'A cat sitting on a table.',
# 'validation_reasoning': 'The caption is valid.',
# 'validation_score': 0.8
# },
# ]
Source code in swiftannotate/image/captioning/qwen.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 | |
__init__(model, processor, caption_prompt=BASE_IMAGE_CAPTION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CAPTION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None, **kwargs)
Initializes the ImageCaptioningQwen2VL pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
AutoModelForImageTextToText
|
Model for image captioning. Should be an instance of AutoModelForImageTextToText with Qwen2-VL pretrained weights. Can be any version of Qwen2-VL model (7B, 72B). |
required |
processor
|
AutoProcessor
|
Processor for the Qwen2-VL model. Should be an instance of AutoProcessor. |
required |
caption_prompt
|
str | None
|
System prompt for captioning images. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image captions should specify the range of validation score to be generated. Uses default BASE_IMAGE_CAPTION_PROMPT prompt if not provided. |
BASE_IMAGE_CAPTION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image caption is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image caption. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Other Parameters:
| Name | Type | Description |
|---|---|---|
resize_height |
int
|
Height to resize the image before generating captions. Defaults to 280. |
resize_width |
int
|
Width to resize the image before generating captions. Defaults to 420. |
Notes
validation_prompt should specify the rules for validating the caption and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
Source code in swiftannotate/image/captioning/qwen.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a caption. Implements the logic to generate captions for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the caption does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better caption. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated caption for the image. |
Source code in swiftannotate/image/captioning/qwen.py
generate(image_paths, **kwargs)
Generates captions for a list of images. Implements the logic to generate captions for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate captions for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of captions, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/captioning/qwen.py
validate(image, caption, **kwargs)
Validates the caption generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
caption
|
str
|
Caption generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the caption. |
Source code in swiftannotate/image/captioning/qwen.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
Classification
GeminiForImageClassification
Bases: BaseImageClassification
GeminiForImageClassification pipeline for generating captions for images using Gemini models.
Example usage:
from swiftannotate.image import GeminiForImageClassification
# Initialize the pipeline
classification_pipeline = GeminiForImageClassification(
caption_model="gemini-1.5-pro",
validation_model="gemini-1.5-flash",
api_key="your_api_key_here",
classification_labels=["kitchen", "bedroom", "living room"],
output_file="captions.json"
)
# Generate captions for a list of images
image_paths = ["path/to/image1.jpg"]
results = classification_pipeline.generate(image_paths)
# Print results
# Output: [
# {
# "image_path": 'path/to/image1.jpg',
# "image_classification": 'kitchen',
# "validation_reasoning": 'The class label is valid.',
# "validation_score": 0.6
# },
# ]
Source code in swiftannotate/image/classification/gemini.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 | |
__init__(classification_model, validation_model, api_key, classification_labels, classification_prompt=BASE_IMAGE_CLASSIFICATION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None)
Initializes the GeminiForImageClassification pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
classification_model
|
str
|
Can be either "gemini-1.5-flash", "gemini-1.5-pro", etc. or specific versions of model supported by Gemini. |
required |
validation_model
|
str
|
Can be either "gemini-1.5-flash", "gemini-1.5-pro", etc. or specific versions of model supported by Gemini. |
required |
api_key
|
str
|
Google Gemini API key. |
required |
classification_labels
|
List[str]
|
List of classification labels to be used for the image classification. |
required |
classification_prompt
|
str | None
|
System prompt for classification images. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image class labels should specify the range of validation score to be generated. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image class labels is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image class labels. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Notes
validation_prompt should specify the rules for validating the class label and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
It is advised to include class descriptions in the classification_prompt and validation_prompt to help the model understand the context of the class labels. You can also add Few-shot learning examples to the prompt to help the model understand the context of the class labels.
Source code in swiftannotate/image/classification/gemini.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a class label. Implements the logic to generate class labels for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the calss label does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better class label. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated class label for the image. |
Source code in swiftannotate/image/classification/gemini.py
generate(image_paths, **kwargs)
Generates class label for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate class labels for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of class labels, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/classification/gemini.py
validate(image, class_label, **kwargs)
Validates the class label generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
class_label
|
str
|
Class Label generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the class label. |
Source code in swiftannotate/image/classification/gemini.py
OllamaForImageClassification
Bases: BaseImageClassification
OllamaForImageClassification pipeline using OpenAI API.
Example usage:
from swiftannotate.image import OllamaForImageClassification
# Initialize the pipeline
classification_pipeline = OllamaForImageClassification(
classification_model="llama3.2-vision",
validation_model="llama3.2-vision",
classification_labels=["kitchen", "bedroom", "living room"],
output_file="captions.json"
)
# Generate captions for a list of images
image_paths = ["path/to/image1.jpg"]
results = classification_pipeline.generate(image_paths)
# Print results
# Output: [
# {
# "image_path": 'path/to/image1.jpg',
# "image_classification": 'kitchen',
# "validation_reasoning": 'The class label is valid.',
# "validation_score": 0.6
# },
# ]
Source code in swiftannotate/image/classification/ollama.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | |
__init__(classification_model, validation_model, classification_labels, classification_prompt=BASE_IMAGE_CLASSIFICATION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None)
Initializes the OllamaForImageClassification pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
classification_model
|
str
|
Can be either any of the Multimodal (Vision) models supported by Ollama. specific versions of model supported by Ollama. |
required |
validation_model
|
str
|
Can be either any of the Multimodal (Vision) models supported by Ollama. specific versions of model supported by Ollama. |
required |
classification_labels
|
List[str]
|
List of classification labels to be used for the image classification. |
required |
classification_prompt
|
str | None
|
System prompt for classification images. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image class labels should specify the range of validation score to be generated. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image class labels is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image class labels. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Notes
validation_prompt should specify the rules for validating the class label and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
It is advised to include class descriptions in the classification_prompt and validation_prompt to help the model understand the context of the class labels. You can also add Few-shot learning examples to the prompt to help the model understand the context of the class labels.
Source code in swiftannotate/image/classification/ollama.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a class label. Implements the logic to generate class labels for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the calss label does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better class label. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated class label for the image. |
Source code in swiftannotate/image/classification/ollama.py
generate(image_paths, **kwargs)
Generates class label for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate class labels for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of class labels, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/classification/ollama.py
validate(image, class_label, **kwargs)
Validates the class label generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
class_label
|
str
|
Class Label generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the class label. |
Source code in swiftannotate/image/classification/ollama.py
OpenAIForImageClassification
Bases: BaseImageClassification
OpenAIForImageClassification pipeline using OpenAI API.
Example usage:
from swiftannotate.image import OpenAIForImageClassification
# Initialize the pipeline
classification_pipeline = OpenAIForImageClassification(
classification_model="gpt-4o",
validation_model="gpt-4o-mini",
api_key="your_api_key_here",
classification_labels=["kitchen", "bedroom", "living room"],
output_file="captions.json"
)
# Generate captions for a list of images
image_paths = ["path/to/image1.jpg"]
results = classification_pipeline.generate(image_paths)
# Print results
# Output: [
# {
# "image_path": 'path/to/image1.jpg',
# "image_classification": 'kitchen',
# "validation_reasoning": 'The class label is valid.',
# "validation_score": 0.6
# },
# ]
Source code in swiftannotate/image/classification/openai.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
__init__(classification_model, validation_model, api_key, classification_labels, classification_prompt=BASE_IMAGE_CLASSIFICATION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None, **kwargs)
Initializes the OpenAIForImageClassification pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
classification_model
|
str
|
Can be either "gpt-4o", "gpt-4o-mini", etc. or specific versions of model supported by OpenAI. |
required |
validation_model
|
str
|
Can be either "gpt-4o", "gpt-4o-mini", etc. or specific versions of model supported by OpenAI. |
required |
api_key
|
str
|
OpenAI API key. |
required |
classification_labels
|
List[str]
|
List of classification labels to be used for the image classification. |
required |
classification_prompt
|
str | None
|
System prompt for classification images. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image class labels should specify the range of validation score to be generated. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image class labels is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image class labels. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Other Parameters:
| Name | Type | Description |
|---|---|---|
detail |
str
|
Specific to OpenAI. Detail level of the image (Higher resolution costs more). Defaults to "low". |
Notes
validation_prompt should specify the rules for validating the class label and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
It is advised to include class descriptions in the classification_prompt and validation_prompt to help the model understand the context of the class labels. You can also add Few-shot learning examples to the prompt to help the model understand the context of the class labels.
Source code in swiftannotate/image/classification/openai.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a class label. Implements the logic to generate class labels for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the calss label does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better class label. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated class label for the image. |
Source code in swiftannotate/image/classification/openai.py
generate(image_paths, **kwargs)
Generates class label for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate class labels for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of class labels, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/classification/openai.py
validate(image, class_label, **kwargs)
Validates the class label generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
class_label
|
str
|
Class Label generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the class label. |
Source code in swiftannotate/image/classification/openai.py
Qwen2VLForImageClassification
Bases: BaseImageClassification
Qwen2VLForImageClassification pipeline using Qwen2VL model.
Example usage:
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers import BitsAndBytesConfig
from swiftannotate.image import Qwen2VLForImageClassification
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
model = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
device_map="auto",
torch_dtype="auto",
quantization_config=quantization_config)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
# Load the Caption Model
kwargs = {"temperature": 0}
classification_pipeline = Qwen2VLForImageClassification(
model=model,
processor=processor,
classification_labels=["kitchen", "bottle", "none"],
output_file="output.json",
)
# Generate captions for images
image_paths = ['path/to/image1.jpg']
results = classification_pipeline.generate(image_paths, **kwargs)
# Print results
# Output: [
# {
# "image_path": 'path/to/image1.jpg',
# "image_classification": 'kitchen',
# "validation_reasoning": 'The class label is valid.',
# "validation_score": 0.6
# },
# ]
Source code in swiftannotate/image/classification/qwen.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |
__init__(model, processor, classification_labels, classification_prompt=BASE_IMAGE_CLASSIFICATION_PROMPT, validation=True, validation_prompt=BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT, validation_threshold=0.5, max_retry=3, output_file=None, **kwargs)
Initializes the Qwen2VLForImageClassification pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
AutoModelForImageTextToText
|
Model for image classification. Should be an instance of AutoModelForImageTextToText with Qwen2-VL pretrained weights. Can be any version of Qwen2-VL model (7B, 72B). |
required |
processor
|
AutoProcessor
|
Processor for the Qwen2-VL model. Should be an instance of AutoProcessor. |
required |
classification_labels
|
List[str]
|
List of classification labels to be used for the image classification. |
required |
classification_prompt
|
str | None
|
System prompt for classification images. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_PROMPT
|
validation
|
bool
|
Use validation step or not. Defaults to True. |
True
|
validation_prompt
|
str | None
|
System prompt for validating image class labels should specify the range of validation score to be generated. Uses default BASE_IMAGE_CLASSIFICATION_PROMPT prompt if not provided. |
BASE_IMAGE_CLASSIFICATION_VALIDATION_PROMPT
|
validation_threshold
|
float
|
Threshold to determine if image class labels is valid or not should be within specified range for validation score. Defaults to 0.5. |
0.5
|
max_retry
|
int
|
Number of retries before giving up on the image class labels. Defaults to 3. |
3
|
output_file
|
str | None
|
Output file path, only JSON is supported for now. Defaults to None. |
None
|
Other Parameters:
| Name | Type | Description |
|---|---|---|
resize_height |
int
|
Height to resize the image before generating class labels. Defaults to 280. |
resize_width |
int
|
Width to resize the image before generating class labels. Defaults to 420. |
Notes
validation_prompt should specify the rules for validating the class label and the range of validation score to be generated example (0-1).
Your validation_threshold should be within this specified range.
It is advised to include class descriptions in the classification_prompt and validation_prompt to help the model understand the context of the class labels. You can also add Few-shot learning examples to the prompt to help the model understand the context of the class labels.
Source code in swiftannotate/image/classification/qwen.py
annotate(image, feedback_prompt='', **kwargs)
Annotates the image with a class label. Implements the logic to generate class labels for an image.
Note: The feedback_prompt is dynamically updated using the validation reasoning from the previous iteration in case the calss label does not pass validation threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
feedback_prompt
|
str
|
Feedback prompt for the user to generate a better class label. Defaults to ''. |
''
|
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated class label for the image. |
Source code in swiftannotate/image/classification/qwen.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
generate(image_paths, **kwargs)
Generates class label for a list of images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths to generate class labels for. |
required |
**kwargs
|
Additional arguments to pass to the method for custom pipeline interactions. To control generation parameters for the model. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: List of class labels, validation reasoning and confidence scores for each image. |
Source code in swiftannotate/image/classification/qwen.py
validate(image, class_label, **kwargs)
Validates the class label generated for the image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
str
|
Base64 encoded image. |
required |
class_label
|
str
|
Class Label generated for the image. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, float]
|
Tuple[str, float]: Validation reasoning and confidence score for the class label. |
Source code in swiftannotate/image/classification/qwen.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 | |
Object Detection
OwlV2ForObjectDetection
Bases: BaseObjectDetection
Source code in swiftannotate/image/object_detection/owlv2.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
__init__(model, processor, class_labels, confidence_threshold=0.5, validation=False, validation_prompt=None, validation_threshold=None, output_file=None)
Initialize the OwlV2ObjectDetection class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Owlv2ForObjectDetection
|
OwlV2 Object Detection model from Transformers. |
required |
processor
|
Owlv2Processor
|
OwlV2 Processor for Object Detection. |
required |
class_labels
|
List[str]
|
List of class labels. |
required |
confidence_threshold
|
float
|
Minimum confidence threshold for object detection. Defaults to 0.5. |
0.5
|
validation
|
bool
|
Whether to validate annotations from OwlV2. Defaults to False. |
False
|
validation_prompt
|
str | None
|
Prompt to validate annotations. Defaults to None. |
None
|
validation_threshold
|
float | None
|
Threshold score for annotation validation. Defaults to None. |
None
|
output_file
|
str | None
|
Path to save results. If None, results are not saved. Defaults to None. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If model is not an instance of Owlv2ForObjectDetection. |
ValueError
|
If processor is not an instance of Owlv2Processor. |
Source code in swiftannotate/image/object_detection/owlv2.py
annotate(image)
Annotate an image with object detection labels
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Image to be annotated. |
required |
Returns:
| Type | Description |
|---|---|
List[dict]
|
List[dict]: List of dictionaries containing the confidence scores, bounding box coordinates and class labels. |
Source code in swiftannotate/image/object_detection/owlv2.py
generate(image_paths)
Generate annotations for a list of image paths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_paths
|
List[str]
|
List of image paths. |
required |
Returns:
| Type | Description |
|---|---|
List[dict]
|
List[dict]: List of dictionaries containing the confidence scores, bounding box coordinates and class labels. |
Source code in swiftannotate/image/object_detection/owlv2.py
validate(image, annotations)
Validate the annotations for an image with object detection labels.
Currently, there is no validation method available for Object Detection.
TODO: Idea is to do some sort of object extraction using annotations and ask VLM to validate the extracted objects.
TODO: Need to figure out a way to use the VLM output for improving annotations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
Image
|
Image to be validated. |
required |
annotations
|
List[dict]
|
List of dictionaries containing the confidence scores, bounding box coordinates and class labels. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
description |