OCR API using pix2struct¶

Bases: Bolt

`init(input, output, state, **kwargs)` ¶

The Pix2StructImageOCRAPI class performs OCR on images using Google's Pix2Struct model. The class exposes an API endpoint for OCR on single images. The endpoint is accessible at /api/v1/ocr. The API takes a POST request with a JSON payload containing a base64 encoded image under the key image_base64. It returns a JSON response containing the OCR result under the key ocr_text.

Parameters:

Name	Type	Description	Default
`input`	`BatchInput`	Instance of BatchInput for reading data.	required
`output`	`BatchOutput`	Instance of BatchOutput for saving data.	required
`state`	`State`	Instance of State for maintaining state.	required
`model_name`	`str`	The name of the Pix2Struct model to use. Default is "google/pix2struct-large".	required
`**kwargs`		Additional keyword arguments.	`{}`

Command Line Invocation with geniusrise¶

genius Pix2StructImageOCRAPI rise \
    batch \
        --bucket my_bucket \
        --s3_folder s3/input \
    batch \
        --bucket my_bucket \
        --s3_folder s3/output \
    none \
    listen \
        --args endpoint=* port=3000 cors_domain=* use_cuda=True

YAML Configuration with geniusrise¶

version: "1"
spouts:
    ocr_processing:
        name: "Pix2StructImageOCRAPI"
        method: "listen"
        args:
            endpoint: *
            port: 3000
            cors_domain: *
            use_cuda: true
        input:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/input"
                use_cuda: true
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/output"
                use_cuda: true

OCR API using pix2struct¶

__init__(input, output, state, **kwargs) ¶

Command Line Invocation with geniusrise¶

YAML Configuration with geniusrise¶

`init(input, output, state, **kwargs)` ¶