Skip to content

AWS DocumentDB

Bases: Spout

__init__(output, state, **kwargs)

Initialize the DocumentDB class.

Parameters:

Name Type Description Default
output BatchOutput

An instance of the BatchOutput class for saving the data.

required
state State

An instance of the State class for maintaining the state.

required
**kwargs

Additional keyword arguments.

{}

Using geniusrise to invoke via command line

genius DocumentDB rise \
    batch \
        --output_s3_bucket my_bucket \
        --output_s3_folder s3/folder \
    none \
    fetch \
        --args host=localhost port=27017 user=myuser password=mypassword database=mydb collection=mycollection query="{}" page_size=100

Using geniusrise to invoke via YAML file

version: "1"
spouts:
    my_documentdb_spout:
        name: "DocumentDB"
        method: "fetch"
        args:
            host: "localhost"
            port: 27017
            user: "myuser"
            password: "mypassword"
            database: "mydb"
            collection: "mycollection"
            query: "{}"
            page_size: 100
        output:
            type: "batch"
            args:
                bucket: "my_bucket"
                s3_folder: "s3/folder"

fetch(host, port, user, password, database, collection, query, page_size=100)

📖 Fetch data from a DocumentDB database and save it in batch.

Parameters:

Name Type Description Default
host str

The DocumentDB host.

required
port int

The DocumentDB port.

required
user str

The DocumentDB user.

required
password str

The DocumentDB password.

required
database str

The DocumentDB database name.

required
collection str

The DocumentDB collection name.

required
query str

The query to execute.

required
page_size int

The number of documents to fetch per page. Defaults to 100.

100

Raises:

Type Description
Exception

If unable to connect to the DocumentDB server or execute the query.