OCR API¶

Bases: VisionAPI

ImageOCRAPI provides Optical Character Recognition (OCR) capabilities for images, leveraging different OCR engines like EasyOCR, PaddleOCR, and Hugging Face models tailored for OCR tasks. This API can decode base64-encoded images, process them through the chosen OCR engine, and return the recognized text.

The API supports dynamic selection of OCR engines and configurations based on the provided model name and arguments, offering flexibility in processing various languages and image types.

Methods

ocr(self): Processes an uploaded image for OCR and returns the recognized text.

Example CLI Usage:

EasyOCR:

genius ImageOCRAPI rise \
    batch \
        --input_folder ./input \
    batch \
        --output_folder ./output \
    none \
    listen \
        --args \
            model_name="easyocr" \
            device_map="cuda:0" \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

Paddle OCR:

genius ImageOCRAPI rise \
    batch \
        --input_folder ./input \
    batch \
        --output_folder ./output \
    none \
    listen \
        --args \
            model_name="paddleocr" \
            device_map="cuda:0" \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

Huggingface models:

genius ImageOCRAPI rise \
    batch \
        --input_folder ./input \
    batch \
        --output_folder ./output \
    none \
    listen \
        --args \
            model_name="facebook/nougat-base" \
            model_class="VisionEncoderDecoderModel" \
            processor_class="NougatProcessor" \
            device_map="cuda:0" \
            use_cuda=True \
            precision="float" \
            quantization=0 \
            max_memory=None \
            torchscript=False \
            compile=False \
            flash_attention=False \
            better_transformers=False \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

`init(input, output, state, **kwargs)` ¶

Initializes the ImageOCRAPI with configurations for input, output, state management, and OCR model specifics.

Parameters:

Name	Type	Description	Default
`input`	`BatchInput`	Configuration for the input data.	required
`output`	`BatchOutput`	Configuration for the output data.	required
`state`	`State`	State management for the API.	required
`**kwargs`		Additional keyword arguments for extended functionality.	`{}`

`ocr()` ¶

Endpoint for performing OCR on an uploaded image. It accepts a base64-encoded image, decodes it, preprocesses it through the specified OCR model, and returns the recognized text.

Returns:

Type	Description
	Dict[str, Any]: A dictionary containing the success status, recognized text ('result'), and the original
	image name ('image_name') if provided.

Raises:

Type	Description
`Exception`	If an error occurs during image processing or OCR.

Example CURL Request:

curl -X POST localhost:3000/api/v1/ocr             -H "Content-Type: application/json"             -d '{"image_base64": "<base64-encoded-image>", "model_name": "easyocr", "use_easyocr_bbox": true}'

or

(base64 -w 0 test_images_ocr/ReceiptSwiss.jpg | awk '{print "{"image_base64": ""$0"", "max_length": 1024}"}' > /tmp/image_payload.json)
curl -X POST http://localhost:3000/api/v1/ocr             -H "Content-Type: application/json"             -u user:password             -d @/tmp/image_payload.json | jq

`process_huggingface_models(image, use_easyocr_bbox)` ¶

Processes the image using a Hugging Face model specified for OCR tasks. Supports advanced configurations and post-processing to handle various OCR-related challenges.

Parameters:

Name	Type	Description	Default
`image`	`Image.Image`	The image to process.	required
`use_easyocr_bbox`	`bool`	Whether to use EasyOCR to detect text bounding boxes before processing with Hugging Face models.	required

Returns:

Name	Type	Description
`str`		The recognized text from the image.

`process_other_models(image)` ¶

Processes the image using non-Hugging Face OCR models like EasyOCR or PaddleOCR based on the initialization.

Parameters:

Name	Type	Description	Default
`image`	`Image.Image`	The image to process.	required

Returns:

Name	Type	Description
`Any`	`Any`	The OCR results which might include text, bounding boxes, and confidence scores depending on the model.

Raises:

Type	Description
`ValueError`	If an invalid or unsupported OCR model is specified.

OCR API¶

__init__(input, output, state, **kwargs) ¶

ocr() ¶

process_huggingface_models(image, use_easyocr_bbox) ¶

process_other_models(image) ¶

`init(input, output, state, **kwargs)` ¶

`ocr()` ¶

`process_huggingface_models(image, use_easyocr_bbox)` ¶

`process_other_models(image)` ¶