Skip to content

OCR API

Bases: VisionAPI

ImageOCRAPI provides Optical Character Recognition (OCR) capabilities for images, leveraging different OCR engines like EasyOCR, PaddleOCR, and Hugging Face models tailored for OCR tasks. This API can decode base64-encoded images, process them through the chosen OCR engine, and return the recognized text.

The API supports dynamic selection of OCR engines and configurations based on the provided model name and arguments, offering flexibility in processing various languages and image types.

Methods

ocr(self): Processes an uploaded image for OCR and returns the recognized text.

Example CLI Usage:

EasyOCR:

genius ImageOCRAPI rise \
    batch \
        --input_folder ./input \
    batch \
        --output_folder ./output \
    none \
    listen \
        --args \
            model_name="easyocr" \
            device_map="cuda:0" \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

Paddle OCR:

genius ImageOCRAPI rise \
    batch \
        --input_folder ./input \
    batch \
        --output_folder ./output \
    none \
    listen \
        --args \
            model_name="paddleocr" \
            device_map="cuda:0" \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

Huggingface models:

genius ImageOCRAPI rise \
    batch \
        --input_folder ./input \
    batch \
        --output_folder ./output \
    none \
    listen \
        --args \
            model_name="facebook/nougat-base" \
            model_class="VisionEncoderDecoderModel" \
            processor_class="NougatProcessor" \
            device_map="cuda:0" \
            use_cuda=True \
            precision="float" \
            quantization=0 \
            max_memory=None \
            torchscript=False \
            compile=False \
            flash_attention=False \
            better_transformers=False \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

__init__(input, output, state, **kwargs)

Initializes the ImageOCRAPI with configurations for input, output, state management, and OCR model specifics.

Parameters:

Name Type Description Default
input BatchInput

Configuration for the input data.

required
output BatchOutput

Configuration for the output data.

required
state State

State management for the API.

required
**kwargs

Additional keyword arguments for extended functionality.

{}

ocr()

Endpoint for performing OCR on an uploaded image. It accepts a base64-encoded image, decodes it, preprocesses it through the specified OCR model, and returns the recognized text.

Returns:

Type Description

Dict[str, Any]: A dictionary containing the success status, recognized text ('result'), and the original

image name ('image_name') if provided.

Raises:

Type Description
Exception

If an error occurs during image processing or OCR.

Example CURL Request:

curl -X POST localhost:3000/api/v1/ocr             -H "Content-Type: application/json"             -d '{"image_base64": "<base64-encoded-image>", "model_name": "easyocr", "use_easyocr_bbox": true}'

or

(base64 -w 0 test_images_ocr/ReceiptSwiss.jpg | awk '{print "{"image_base64": ""$0"", "max_length": 1024}"}' > /tmp/image_payload.json)
curl -X POST http://localhost:3000/api/v1/ocr             -H "Content-Type: application/json"             -u user:password             -d @/tmp/image_payload.json | jq

process_huggingface_models(image, use_easyocr_bbox)

Processes the image using a Hugging Face model specified for OCR tasks. Supports advanced configurations and post-processing to handle various OCR-related challenges.

Parameters:

Name Type Description Default
image Image.Image

The image to process.

required
use_easyocr_bbox bool

Whether to use EasyOCR to detect text bounding boxes before processing with Hugging Face models.

required

Returns:

Name Type Description
str

The recognized text from the image.

process_other_models(image)

Processes the image using non-Hugging Face OCR models like EasyOCR or PaddleOCR based on the initialization.

Parameters:

Name Type Description Default
image Image.Image

The image to process.

required

Returns:

Name Type Description
Any Any

The OCR results which might include text, bounding boxes, and confidence scores depending on the model.

Raises:

Type Description
ValueError

If an invalid or unsupported OCR model is specified.