Skip to content

OCR using trocr

Bases: Bolt

__init__(input, output, state, **kwargs)

The TROCRImageOCR class performs OCR (Optical Character Recognition) on images using Microsoft's TROCR model. It expects the input.input_folder to contain the images for OCR and saves the OCR results as JSON files in output.output_folder.

Parameters:

Name Type Description Default
input BatchInput

Instance of BatchInput for reading data.

required
output BatchOutput

Instance of BatchOutput for saving data.

required
state State

Instance of State for maintaining state.

required
**kwargs

Additional keyword arguments.

{}

Command Line Invocation with geniusrise

genius TROCRImageOCR rise \
    batch \
        --bucket my_bucket \
        --s3_folder s3/input \
    batch \
        --bucket my_bucket \
        --s3_folder s3/output \
    none \
    process

YAML Configuration with geniusrise

version: "1"
spouts:
    ocr_processing:
        name: "TROCRImageOCR"
        method: "process"
        input:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/input"
                use_cuda: true
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/output"
                use_cuda: true

process(kind='printed', use_cuda=True)

📖 Perform OCR on images in the input folder and save the OCR results as JSON files in the output folder.

This method iterates through each image file in input.input_folder, performs OCR using the TROCR model, and saves the OCR results as JSON files in output.output_folder.

Parameters:

Name Type Description Default
kind str

The kind of TROCR model to use. Default is "printed". Options are "printed" or "handwritten".

'printed'
use_cuda bool

Whether to use CUDA for model inference. Default is True.

True