Skip to content

OCR using pix2struct

Bases: Bolt

__init__(input, output, state, model_name='google/pix2struct-large', **kwargs)

The Pix2StructImageOCR class performs OCR on images using Google's Pix2Struct model. It expects the input.input_folder to contain the images for OCR and saves the OCR results as JSON files in output.output_folder.

Parameters:

Name Type Description Default
input BatchInput

Instance of BatchInput for reading data.

required
output BatchOutput

Instance of BatchOutput for saving data.

required
state State

Instance of State for maintaining state.

required
model_name str

The name of the Pix2Struct model to use. Default is "google/pix2struct-large".

'google/pix2struct-large'
**kwargs

Additional keyword arguments.

{}

Command Line Invocation with geniusrise

genius Pix2StructImageOCR rise \
    batch \
        --bucket my_bucket \
        --s3_folder s3/input \
    batch \
        --bucket my_bucket \
        --s3_folder s3/output \
    none \
    process

YAML Configuration with geniusrise

version: "1"
spouts:
    ocr_processing:
        name: "Pix2StructImageOCR"
        method: "process"
        input:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/input"
                use_cuda: true
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/output"
                use_cuda: true

process(use_cuda=True)

📖 Perform OCR on images in the input folder and save the OCR results as JSON files in the output folder.

Parameters:

Name Type Description Default
use_cuda bool

Whether to use CUDA for model inference. Default is True.

True