Skip to content

Text to Speech

Bases: AudioAPI

TextToSpeechAPI for converting text to speech using various TTS models.

Attributes:

Name Type Description
model AutoModelForSeq2SeqLM

The text-to-speech model.

tokenizer AutoTokenizer

The tokenizer for the model.

Methods

synthesize(text_input: str) -> bytes: Converts the given text input to speech using the text-to-speech model.

Example CLI Usage:

genius TextToSpeechAPI rise \
batch \
    --input_folder ./input \
batch \
    --output_folder ./output \
none \
    --id facebook/mms-tts-eng \
    listen \
        --args \
            model_name="facebook/mms-tts-eng" \
            model_class="VitsModel" \
            processor_class="VitsTokenizer" \
            use_cuda=True \
            precision="float32" \
            quantization=0 \
            device_map="cuda:0" \
            max_memory=None \
            torchscript=False \
            compile=False \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

__init__(input, output, state, **kwargs)

Initializes the TextToSpeechAPI with configurations for text-to-speech processing.

Parameters:

Name Type Description Default
input BatchInput

The input data configuration.

required
output BatchOutput

The output data configuration.

required
state State

The state configuration.

required
**kwargs

Additional keyword arguments.

{}

initialize_pipeline()

Lazy initialization of the TTS Hugging Face pipeline.

synthesize()

API endpoint to convert text input to speech using the text-to-speech model. Expects a JSON input with 'text' as a key containing the text to be synthesized.

Returns:

Type Description

Dict[str, str]: A dictionary containing the base64 encoded audio data.

Example CURL Request for synthesis: ... [Provide example CURL request] ...

tts_pipeline(**kwargs)

Converts text to speech using the Hugging Face pipeline.

Parameters:

Name Type Description Default
**kwargs Any

Arbitrary keyword arguments, typically containing 'text' for the input text.

{}

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: A dictionary containing the base64 encoded audio data.

Example CURL Request for synthesis: ... [Provide example CURL request] ...