Skip to content

OCR API using trocr

Bases: Bolt

__init__(input, output, state, **kwargs)

The TROCRImageOCR class performs OCR (Optical Character Recognition) on images using Microsoft's TROCR model. The class exposes an API endpoint for OCR on single images. The endpoint is accessible at /api/v1/ocr. The API takes a POST request with a JSON payload containing a base64 encoded image under the key image_base64. It returns a JSON response containing the OCR result under the key ocr_text.

Parameters:

Name Type Description Default
input BatchInput

Instance of BatchInput for reading data.

required
output BatchOutput

Instance of BatchOutput for saving data.

required
state State

Instance of State for maintaining state.

required
**kwargs

Additional keyword arguments.

{}

Command Line Invocation with geniusrise

genius TROCRImageOCR rise \
    batch \
        --bucket my_bucket \
        --s3_folder s3/input \
    batch \
        --bucket my_bucket \
        --s3_folder s3/output \
    none \
    listen \
        --args endpoint=* port=3000 cors_domain=* kind=handwriting use_cuda=True

YAML Configuration with geniusrise

version: "1"
spouts:
    ocr_processing:
        name: "TROCRImageOCR"
        method: "listen"
        args:
            endpoint: *
            port: 3000
            cors_domain: *
            kind: handwriting
            use_cuda: true
        input:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/input"
                use_cuda: true
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/output"
                use_cuda: true
API Example
curl -X POST "http://localhost:3000/api/v1/ocr" -H "Content-Type: application/json" -d '{"image_base64": "your_base64_encoded_image_here"}'

preprocess_and_detect_boxes(image)

Preprocess the image and detect text bounding boxes using the EAST model.

Parameters:

Name Type Description Default
image Image.Image

PIL Image object.

required

Returns:

Type Description
List[Tuple[int, int, int, int]]

List[Tuple[int, int, int, int]]: List of bounding boxes (x, y, w, h).