OCR API using pix2struct¶
Bases: Bolt
__init__(input, output, state, **kwargs)
¶
The Pix2StructImageOCRAPI
class performs OCR on images using Google's Pix2Struct model.
The class exposes an API endpoint for OCR on single images. The endpoint is accessible at /api/v1/ocr
.
The API takes a POST request with a JSON payload containing a base64 encoded image under the key image_base64
.
It returns a JSON response containing the OCR result under the key ocr_text
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
Instance of BatchInput for reading data. |
required |
output |
BatchOutput
|
Instance of BatchOutput for saving data. |
required |
state |
State
|
Instance of State for maintaining state. |
required |
model_name |
str
|
The name of the Pix2Struct model to use. Default is "google/pix2struct-large". |
required |
**kwargs |
Additional keyword arguments. |
{}
|
Command Line Invocation with geniusrise¶
genius Pix2StructImageOCRAPI rise \
batch \
--bucket my_bucket \
--s3_folder s3/input \
batch \
--bucket my_bucket \
--s3_folder s3/output \
none \
listen \
--args endpoint=* port=3000 cors_domain=* use_cuda=True
YAML Configuration with geniusrise¶
version: "1"
spouts:
ocr_processing:
name: "Pix2StructImageOCRAPI"
method: "listen"
args:
endpoint: *
port: 3000
cors_domain: *
use_cuda: true
input:
type: "batch"
args:
bucket: "my_bucket"
s3_folder: "s3/input"
use_cuda: true
output:
type: "batch"
args:
bucket: "my_bucket"
s3_folder: "s3/output"
use_cuda: true