Base Bulk Inference¶
Bases: Bolt
AudioBulk is a class designed for bulk processing of audio data using various audio models from Hugging Face. It focuses on audio generation and transformation tasks, supporting a range of models and configurations.
Attributes:
Name | Type | Description |
---|---|---|
model |
AutoModelForAudioClassification
|
The audio model for generation or transformation tasks. |
processor |
AutoFeatureExtractor
|
The processor for preparing input data for the model. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
Configuration and data inputs for the batch process. |
required |
output |
BatchOutput
|
Configurations for output data handling. |
required |
state |
State
|
State management for the Bolt. |
required |
**kwargs |
Arbitrary keyword arguments for extended configurations. |
{}
|
Methods
audio(**kwargs: Any) -> Dict[str, Any]: Provides an API endpoint for audio processing functionality. Accepts various parameters for customizing the audio processing tasks.
process(audio_input: Union[str, bytes], **processing_params: Any) -> dict: Processes the audio input based on the provided parameters. Supports multiple processing methods.
__init__(input, output, state, **kwargs)
¶
Initializes the AudioBulk with configurations and sets up logging. Prepares the environment for audio processing tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
The input data configuration for the audio processing task. |
required |
output |
BatchOutput
|
The output data configuration for the results of the audio processing. |
required |
state |
State
|
The state configuration for the Bolt, managing its operational status. |
required |
**kwargs |
Additional keyword arguments for extended functionality and model configurations. |
{}
|
done()
¶
Finalizes the AudioBulk processing. Sends notification email if configured.
This method should be called after all audio processing tasks are complete. It handles any final steps such as sending notifications or cleaning up resources.
load_models(model_name, processor_name, model_revision=None, processor_revision=None, model_class='', processor_class='AutoFeatureExtractor', use_cuda=False, precision='float16', quantization=0, device_map='auto', max_memory={0: '24GB'}, torchscript=False, compile=False, flash_attention=False, better_transformers=False, use_whisper_cpp=False, use_faster_whisper=False, **model_args)
¶
Loads and configures the specified audio model and processor for audio processing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
Name or path of the audio model to load. |
required |
processor_name |
str
|
Name or path of the processor to load. |
required |
model_revision |
Optional[str]
|
Specific model revision to load (e.g., commit hash). |
None
|
processor_revision |
Optional[str]
|
Specific processor revision to load. |
None
|
model_class |
str
|
Class of the model to be loaded. |
''
|
processor_class |
str
|
Class of the processor to be loaded. |
'AutoFeatureExtractor'
|
use_cuda |
bool
|
Flag to use CUDA for GPU acceleration. |
False
|
precision |
str
|
Desired precision for computations ("float32", "float16", etc.). |
'float16'
|
quantization |
int
|
Bit level for model quantization (0 for none, 8 for 8-bit). |
0
|
device_map |
Union[str, Dict, None]
|
Specific device(s) for model operations. |
'auto'
|
max_memory |
Dict[int, str]
|
Maximum memory allocation for the model. |
{0: '24GB'}
|
torchscript |
bool
|
Enable TorchScript for model optimization. |
False
|
compile |
bool
|
Enable Torch JIT compilation. |
False
|
flash_attention |
bool
|
Flag to enable Flash Attention optimization for faster processing. |
False
|
better_transformers |
bool
|
Flag to enable Better Transformers optimization for faster processing. |
False
|
use_whisper_cpp |
bool
|
Whether to use whisper.cpp to load the model. Defaults to False. Note: only works for these models: https://github.com/aarnphm/whispercpp/blob/524dd6f34e9d18137085fb92a42f1c31c9c6bc29/src/whispercpp/utils.py#L32 |
False
|
use_faster_whisper |
bool
|
Whether to use faster-whisper. |
False
|
**model_args |
Any
|
Additional arguments for model loading. |
{}
|
Returns:
Type | Description |
---|---|
Tuple[AutoModelForAudioClassification, AutoFeatureExtractor]
|
Tuple[AutoModelForAudioClassification, AutoFeatureExtractor]: Loaded model and processor. |