OCR API using trocr¶
Bases: Bolt
__init__(input, output, state, **kwargs)
¶
The TROCRImageOCR
class performs OCR (Optical Character Recognition) on images using Microsoft's TROCR model.
The class exposes an API endpoint for OCR on single images. The endpoint is accessible at /api/v1/ocr
.
The API takes a POST request with a JSON payload containing a base64 encoded image under the key image_base64
.
It returns a JSON response containing the OCR result under the key ocr_text
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
Instance of BatchInput for reading data. |
required |
output |
BatchOutput
|
Instance of BatchOutput for saving data. |
required |
state |
State
|
Instance of State for maintaining state. |
required |
**kwargs |
Additional keyword arguments. |
{}
|
Command Line Invocation with geniusrise¶
genius TROCRImageOCR rise \
batch \
--bucket my_bucket \
--s3_folder s3/input \
batch \
--bucket my_bucket \
--s3_folder s3/output \
none \
listen \
--args endpoint=* port=3000 cors_domain=* kind=handwriting use_cuda=True
YAML Configuration with geniusrise¶
version: "1"
spouts:
ocr_processing:
name: "TROCRImageOCR"
method: "listen"
args:
endpoint: *
port: 3000
cors_domain: *
kind: handwriting
use_cuda: true
input:
type: "batch"
args:
bucket: "my_bucket"
s3_folder: "s3/input"
use_cuda: true
output:
type: "batch"
args:
bucket: "my_bucket"
s3_folder: "s3/output"
use_cuda: true
API Example¶
preprocess_and_detect_boxes(image)
¶
Preprocess the image and detect text bounding boxes using the EAST model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image |
Image.Image
|
PIL Image object. |
required |
Returns:
Type | Description |
---|---|
List[Tuple[int, int, int, int]]
|
List[Tuple[int, int, int, int]]: List of bounding boxes (x, y, w, h). |