Skip to content

OCR using pix2struct

OCR using pix2struct¶

Bases: Bolt

`init(input, output, state, model_name='google/pix2struct-large', **kwargs)` ¶

The Pix2StructImageOCR class performs OCR on images using Google's Pix2Struct model. It expects the input.input_folder to contain the images for OCR and saves the OCR results as JSON files in output.output_folder.

Parameters:

Name	Type	Description	Default
`input`	`BatchInput`	Instance of BatchInput for reading data.	required
`output`	`BatchOutput`	Instance of BatchOutput for saving data.	required
`state`	`State`	Instance of State for maintaining state.	required
`model_name`	`str`	The name of the Pix2Struct model to use. Default is "google/pix2struct-large".	`'google/pix2struct-large'`
`**kwargs`		Additional keyword arguments.	`{}`

Command Line Invocation with geniusrise¶

genius Pix2StructImageOCR rise \
    batch \
        --bucket my_bucket \
        --s3_folder s3/input \
    batch \
        --bucket my_bucket \
        --s3_folder s3/output \
    none \
    process

YAML Configuration with geniusrise¶

version: "1"
spouts:
    ocr_processing:
        name: "Pix2StructImageOCR"
        method: "process"
        input:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/input"
                use_cuda: true
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/output"
                use_cuda: true

`process(use_cuda=True)` ¶

📖 Perform OCR on images in the input folder and save the OCR results as JSON files in the output folder.

Parameters:

Name	Type	Description	Default
`use_cuda`	`bool`	Whether to use CUDA for model inference. Default is True.	`True`