OCR using pix2struct¶
Bases: Bolt
__init__(input, output, state, model_name='google/pix2struct-large', **kwargs)
¶
The Pix2StructImageOCR
class performs OCR on images using Google's Pix2Struct model.
It expects the input.input_folder
to contain the images for OCR and saves the OCR results as JSON files in output.output_folder
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
Instance of BatchInput for reading data. |
required |
output |
BatchOutput
|
Instance of BatchOutput for saving data. |
required |
state |
State
|
Instance of State for maintaining state. |
required |
model_name |
str
|
The name of the Pix2Struct model to use. Default is "google/pix2struct-large". |
'google/pix2struct-large'
|
**kwargs |
Additional keyword arguments. |
{}
|
Command Line Invocation with geniusrise¶
genius Pix2StructImageOCR rise \
batch \
--bucket my_bucket \
--s3_folder s3/input \
batch \
--bucket my_bucket \
--s3_folder s3/output \
none \
process
YAML Configuration with geniusrise¶
process(use_cuda=True)
¶
📖 Perform OCR on images in the input folder and save the OCR results as JSON files in the output folder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
use_cuda |
bool
|
Whether to use CUDA for model inference. Default is True. |
True
|