Skip to content

Elasticsearch

Bases: Spout

__init__(output, state, **kwargs)

Initialize the Elasticsearch class.

Parameters:

Name Type Description Default
output BatchOutput

An instance of the BatchOutput class for saving the data.

required
state State

An instance of the State class for maintaining the state.

required
**kwargs Any

Additional keyword arguments.

{}

Using geniusrise to invoke via command line

genius Elasticsearch rise \
    batch \
        --output_s3_bucket my_bucket \
        --output_s3_folder s3/folder \
    none \
    fetch \
        --args hosts=localhost:9200 index=my_index query='{"query": {"match_all": {}}}' page_size=100

Using geniusrise to invoke via YAML file

version: "1"
spouts:
    my_elasticsearch_spout:
        name: "Elasticsearch"
        method: "fetch"
        args:
            hosts: "localhost:9200"
            index: "my_index"
            query: '{"query": {"match_all": {}}}'
            page_size: 100
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/folder"

fetch(hosts, index, query, page_size=100)

📖 Fetch data from an Elasticsearch index and save it in batch.

Parameters:

Name Type Description Default
hosts str

Comma-separated list of Elasticsearch hosts.

required
index str

The Elasticsearch index to query.

required
query str

The Elasticsearch query in JSON format.

required
page_size int

The number of documents to fetch per page. Defaults to 100.

100

Raises:

Type Description
Exception

If unable to connect to the Elasticsearch cluster or execute the query.