Skip to content

Base Bulk Inference

Bases: Bolt

AudioBulk is a class designed for bulk processing of audio data using various audio models from Hugging Face. It focuses on audio generation and transformation tasks, supporting a range of models and configurations.

Attributes:

Name Type Description
model AutoModelForAudioClassification

The audio model for generation or transformation tasks.

processor AutoFeatureExtractor

The processor for preparing input data for the model.

Parameters:

Name Type Description Default
input BatchInput

Configuration and data inputs for the batch process.

required
output BatchOutput

Configurations for output data handling.

required
state State

State management for the Bolt.

required
**kwargs

Arbitrary keyword arguments for extended configurations.

{}
Methods

audio(**kwargs: Any) -> Dict[str, Any]: Provides an API endpoint for audio processing functionality. Accepts various parameters for customizing the audio processing tasks.

process(audio_input: Union[str, bytes], **processing_params: Any) -> dict: Processes the audio input based on the provided parameters. Supports multiple processing methods.

__init__(input, output, state, **kwargs)

Initializes the AudioBulk with configurations and sets up logging. Prepares the environment for audio processing tasks.

Parameters:

Name Type Description Default
input BatchInput

The input data configuration for the audio processing task.

required
output BatchOutput

The output data configuration for the results of the audio processing.

required
state State

The state configuration for the Bolt, managing its operational status.

required
**kwargs

Additional keyword arguments for extended functionality and model configurations.

{}

done()

Finalizes the AudioBulk processing. Sends notification email if configured.

This method should be called after all audio processing tasks are complete. It handles any final steps such as sending notifications or cleaning up resources.

load_models(model_name, processor_name, model_revision=None, processor_revision=None, model_class='', processor_class='AutoFeatureExtractor', use_cuda=False, precision='float16', quantization=0, device_map='auto', max_memory={0: '24GB'}, torchscript=False, compile=False, flash_attention=False, better_transformers=False, use_whisper_cpp=False, use_faster_whisper=False, **model_args)

Loads and configures the specified audio model and processor for audio processing.

Parameters:

Name Type Description Default
model_name str

Name or path of the audio model to load.

required
processor_name str

Name or path of the processor to load.

required
model_revision Optional[str]

Specific model revision to load (e.g., commit hash).

None
processor_revision Optional[str]

Specific processor revision to load.

None
model_class str

Class of the model to be loaded.

''
processor_class str

Class of the processor to be loaded.

'AutoFeatureExtractor'
use_cuda bool

Flag to use CUDA for GPU acceleration.

False
precision str

Desired precision for computations ("float32", "float16", etc.).

'float16'
quantization int

Bit level for model quantization (0 for none, 8 for 8-bit).

0
device_map Union[str, Dict, None]

Specific device(s) for model operations.

'auto'
max_memory Dict[int, str]

Maximum memory allocation for the model.

{0: '24GB'}
torchscript bool

Enable TorchScript for model optimization.

False
compile bool

Enable Torch JIT compilation.

False
flash_attention bool

Flag to enable Flash Attention optimization for faster processing.

False
better_transformers bool

Flag to enable Better Transformers optimization for faster processing.

False
use_whisper_cpp bool

Whether to use whisper.cpp to load the model. Defaults to False. Note: only works for these models: https://github.com/aarnphm/whispercpp/blob/524dd6f34e9d18137085fb92a42f1c31c9c6bc29/src/whispercpp/utils.py#L32

False
use_faster_whisper bool

Whether to use faster-whisper.

False
**model_args Any

Additional arguments for model loading.

{}

Returns:

Type Description
Tuple[AutoModelForAudioClassification, AutoFeatureExtractor]

Tuple[AutoModelForAudioClassification, AutoFeatureExtractor]: Loaded model and processor.