Elasticsearch¶
Bases: Spout
__init__(output, state, **kwargs)
¶
Initialize the Elasticsearch class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output |
BatchOutput
|
An instance of the BatchOutput class for saving the data. |
required |
state |
State
|
An instance of the State class for maintaining the state. |
required |
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Using geniusrise to invoke via command line¶
genius Elasticsearch rise \
batch \
--output_s3_bucket my_bucket \
--output_s3_folder s3/folder \
none \
fetch \
--args hosts=localhost:9200 index=my_index query='{"query": {"match_all": {}}}' page_size=100
Using geniusrise to invoke via YAML file¶
fetch(hosts, index, query, page_size=100)
¶
📖 Fetch data from an Elasticsearch index and save it in batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hosts |
str
|
Comma-separated list of Elasticsearch hosts. |
required |
index |
str
|
The Elasticsearch index to query. |
required |
query |
str
|
The Elasticsearch query in JSON format. |
required |
page_size |
int
|
The number of documents to fetch per page. Defaults to 100. |
100
|
Raises:
Type | Description |
---|---|
Exception
|
If unable to connect to the Elasticsearch cluster or execute the query. |