Skip to content

HBase

Bases: Spout

__init__(output, state, **kwargs)

Initialize the HBase class.

Parameters:

Name Type Description Default
output BatchOutput

An instance of the BatchOutput class for saving the data.

required
state State

An instance of the State class for maintaining the state.

required
**kwargs

Additional keyword arguments.

{}

Using geniusrise to invoke via command line

genius HBase rise \
    batch \
        --output_s3_bucket my_bucket \
        --output_s3_folder s3/folder \
    none \
    fetch \
        --args host=localhost table=my_table row_start=start row_stop=stop batch_size=100

Using geniusrise to invoke via YAML file

version: "1"
spouts:
    my_hbase_spout:
        name: "HBase"
        method: "fetch"
        args:
            host: "localhost"
            table: "my_table"
            row_start: "start"
            row_stop: "stop"
            batch_size: 100
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/folder"

fetch(host, table, row_start, row_stop, batch_size=100)

📖 Fetch data from an HBase table and save it in batch.

Parameters:

Name Type Description Default
host str

The HBase host.

required
table str

The HBase table name.

required
row_start str

The row key to start scanning from.

required
row_stop str

The row key to stop scanning at.

required
batch_size int

The number of rows to fetch per batch. Defaults to 100.

100

Raises:

Type Description
Exception

If unable to connect to the HBase server or execute the scan.