OCR API using trocr¶

Bases: Bolt

`init(input, output, state, **kwargs)` ¶

The TROCRImageOCR class performs OCR (Optical Character Recognition) on images using Microsoft's TROCR model. The class exposes an API endpoint for OCR on single images. The endpoint is accessible at /api/v1/ocr. The API takes a POST request with a JSON payload containing a base64 encoded image under the key image_base64. It returns a JSON response containing the OCR result under the key ocr_text.

Parameters:

Name	Type	Description	Default
`input`	`BatchInput`	Instance of BatchInput for reading data.	required
`output`	`BatchOutput`	Instance of BatchOutput for saving data.	required
`state`	`State`	Instance of State for maintaining state.	required
`**kwargs`		Additional keyword arguments.	`{}`

Command Line Invocation with geniusrise¶

genius TROCRImageOCR rise \
    batch \
        --bucket my_bucket \
        --s3_folder s3/input \
    batch \
        --bucket my_bucket \
        --s3_folder s3/output \
    none \
    listen \
        --args endpoint=* port=3000 cors_domain=* kind=handwriting use_cuda=True

YAML Configuration with geniusrise¶

version: "1"
spouts:
    ocr_processing:
        name: "TROCRImageOCR"
        method: "listen"
        args:
            endpoint: *
            port: 3000
            cors_domain: *
            kind: handwriting
            use_cuda: true
        input:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/input"
                use_cuda: true
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/output"
                use_cuda: true

API Example¶

curl -X POST "http://localhost:3000/api/v1/ocr" -H "Content-Type: application/json" -d '{"image_base64": "your_base64_encoded_image_here"}'

`preprocess_and_detect_boxes(image)` ¶

Preprocess the image and detect text bounding boxes using the EAST model.