OCR API¶
Bases: VisionAPI
ImageOCRAPI provides Optical Character Recognition (OCR) capabilities for images, leveraging different OCR engines like EasyOCR, PaddleOCR, and Hugging Face models tailored for OCR tasks. This API can decode base64-encoded images, process them through the chosen OCR engine, and return the recognized text.
The API supports dynamic selection of OCR engines and configurations based on the provided model name and arguments, offering flexibility in processing various languages and image types.
Methods
ocr(self): Processes an uploaded image for OCR and returns the recognized text.
Example CLI Usage:
EasyOCR:
genius ImageOCRAPI rise \
batch \
--input_folder ./input \
batch \
--output_folder ./output \
none \
listen \
--args \
model_name="easyocr" \
device_map="cuda:0" \
endpoint="*" \
port=3000 \
cors_domain="http://localhost:3000" \
username="user" \
password="password"
Paddle OCR:
genius ImageOCRAPI rise \
batch \
--input_folder ./input \
batch \
--output_folder ./output \
none \
listen \
--args \
model_name="paddleocr" \
device_map="cuda:0" \
endpoint="*" \
port=3000 \
cors_domain="http://localhost:3000" \
username="user" \
password="password"
Huggingface models:
genius ImageOCRAPI rise \
batch \
--input_folder ./input \
batch \
--output_folder ./output \
none \
listen \
--args \
model_name="facebook/nougat-base" \
model_class="VisionEncoderDecoderModel" \
processor_class="NougatProcessor" \
device_map="cuda:0" \
use_cuda=True \
precision="float" \
quantization=0 \
max_memory=None \
torchscript=False \
compile=False \
flash_attention=False \
better_transformers=False \
endpoint="*" \
port=3000 \
cors_domain="http://localhost:3000" \
username="user" \
password="password"
__init__(input, output, state, **kwargs)
¶
Initializes the ImageOCRAPI with configurations for input, output, state management, and OCR model specifics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
Configuration for the input data. |
required |
output |
BatchOutput
|
Configuration for the output data. |
required |
state |
State
|
State management for the API. |
required |
**kwargs |
Additional keyword arguments for extended functionality. |
{}
|
ocr()
¶
Endpoint for performing OCR on an uploaded image. It accepts a base64-encoded image, decodes it, preprocesses it through the specified OCR model, and returns the recognized text.
Returns:
Type | Description |
---|---|
Dict[str, Any]: A dictionary containing the success status, recognized text ('result'), and the original |
|
image name ('image_name') if provided. |
Raises:
Type | Description |
---|---|
Exception
|
If an error occurs during image processing or OCR. |
Example CURL Request:
curl -X POST localhost:3000/api/v1/ocr -H "Content-Type: application/json" -d '{"image_base64": "<base64-encoded-image>", "model_name": "easyocr", "use_easyocr_bbox": true}'
or
process_huggingface_models(image, use_easyocr_bbox)
¶
Processes the image using a Hugging Face model specified for OCR tasks. Supports advanced configurations and post-processing to handle various OCR-related challenges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image |
Image.Image
|
The image to process. |
required |
use_easyocr_bbox |
bool
|
Whether to use EasyOCR to detect text bounding boxes before processing with Hugging Face models. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
The recognized text from the image. |
process_other_models(image)
¶
Processes the image using non-Hugging Face OCR models like EasyOCR or PaddleOCR based on the initialization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image |
Image.Image
|
The image to process. |
required |
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
The OCR results which might include text, bounding boxes, and confidence scores depending on the model. |
Raises:
Type | Description |
---|---|
ValueError
|
If an invalid or unsupported OCR model is specified. |