Skip to content

Spout

Core Spout class

Spout

Bases: Task

Base class for all spouts.

__call__(method_name, *args, **kwargs)

Execute a method locally and manage the state.

Parameters:

Name Type Description Default
method_name str

The name of the method to execute.

required
*args

Positional arguments to pass to the method.

()
**kwargs

Keyword arguments to pass to the method. Keyword Arguments: - Additional keyword arguments specific to the method.

{}

Returns:

Name Type Description
Any Any

The result of the method.

__init__(output, state, id=None, **kwargs)

The Spout class is a base class for all spouts in the given context. It inherits from the Task class and provides methods for executing tasks both locally and remotely, as well as managing their state, with state management options including in-memory, Redis, PostgreSQL, and DynamoDB, and output data for batch or streaming data.

The Spout class uses the Output and State classes, which are abstract base classes for managing output data and states, respectively. The Output class has two subclasses: StreamingOutput and BatchOutput, which manage streaming and batch output data, respectively. The State class is used to get and set state, and it has several subclasses for different types of state managers.

The Spout class also uses the ECSManager and K8sManager classes in the execute_remote method, which are used to manage tasks on Amazon ECS and Kubernetes, respectively.

Usage
  • Create an instance of the Spout class by providing an Output object and a State object.
  • The Output object specifies the output data for the spout.
  • The State object handles the management of the spout's state.
Example

output = Output(...) state = State(...) spout = Spout(output, state)

Parameters:

Name Type Description Default
output Output

The output data.

required
state State

The state manager.

required

create(klass, output_type, state_type, id=None, **kwargs) staticmethod

Create a spout of a specific type.

Parameters:

Name Type Description Default
klass type

The Spout class to create.

required
output_type str

The type of output ("batch" or "streaming").

required
state_type str

The type of state manager ("none", "redis", "postgres", or "dynamodb").

required
**kwargs

Additional keyword arguments for initializing the spout.

Keyword Arguments:
    Batch output:
    - output_folder (str): The directory where output files should be stored temporarily.
    - output_s3_bucket (str): The name of the S3 bucket for output storage.
    - output_s3_folder (str): The S3 folder for output storage.
    Streaming output:
    - output_kafka_topic (str): Kafka output topic for streaming spouts.
    - output_kafka_cluster_connection_string (str): Kafka connection string for streaming spouts.
    Stream to Batch output:
    - output_folder (str): The directory where output files should be stored temporarily.
    - output_s3_bucket (str): The name of the S3 bucket for output storage.
    - output_s3_folder (str): The S3 folder for output storage.
    - buffer_size (int): Number of messages to buffer.
    Redis state manager config:
    - redis_host (str): The host address for the Redis server.
    - redis_port (int): The port number for the Redis server.
    - redis_db (int): The Redis database to be used.
    Postgres state manager config:
    - postgres_host (str): The host address for the PostgreSQL server.
    - postgres_port (int): The port number for the PostgreSQL server.
    - postgres_user (str): The username for the PostgreSQL server.
    - postgres_password (str): The password for the PostgreSQL server.
    - postgres_database (str): The PostgreSQL database to be used.
    - postgres_table (str): The PostgreSQL table to be used.
    DynamoDB state manager config:
    - dynamodb_table_name (str): The name of the DynamoDB table.
    - dynamodb_region_name (str): The AWS region for DynamoDB

{}

Returns:

Name Type Description
Spout Spout

The created spout.

Raises:

Type Description
ValueError

If an invalid output type or state type is provided.