Text to Speech¶

Bases: AudioAPI

TextToSpeechAPI for converting text to speech using various TTS models.

Attributes:

Name	Type	Description
`model`	`AutoModelForSeq2SeqLM`	The text-to-speech model.
`tokenizer`	`AutoTokenizer`	The tokenizer for the model.

Methods

synthesize(text_input: str) -> bytes: Converts the given text input to speech using the text-to-speech model.

Example CLI Usage:

genius TextToSpeechAPI rise \
batch \
    --input_folder ./input \
batch \
    --output_folder ./output \
none \
    --id facebook/mms-tts-eng \
    listen \
        --args \
            model_name="facebook/mms-tts-eng" \
            model_class="VitsModel" \
            processor_class="VitsTokenizer" \
            use_cuda=True \
            precision="float32" \
            quantization=0 \
            device_map="cuda:0" \
            max_memory=None \
            torchscript=False \
            compile=False \
            endpoint="*" \
            port=3000 \
            cors_domain="http://localhost:3000" \
            username="user" \
            password="password"

`init(input, output, state, **kwargs)` ¶

Initializes the TextToSpeechAPI with configurations for text-to-speech processing.

Parameters:

Name	Type	Description	Default
`input`	`BatchInput`	The input data configuration.	required
`output`	`BatchOutput`	The output data configuration.	required
`state`	`State`	The state configuration.	required
`**kwargs`		Additional keyword arguments.	`{}`

`initialize_pipeline()` ¶

Lazy initialization of the TTS Hugging Face pipeline.

`synthesize()` ¶

API endpoint to convert text input to speech using the text-to-speech model. Expects a JSON input with 'text' as a key containing the text to be synthesized.

Returns:

Type	Description
	Dict[str, str]: A dictionary containing the base64 encoded audio data.

Example CURL Request for synthesis: ... [Provide example CURL request] ...

`tts_pipeline(**kwargs)` ¶

Converts text to speech using the Hugging Face pipeline.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Arbitrary keyword arguments, typically containing 'text' for the input text.	`{}`

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: A dictionary containing the base64 encoded audio data.

Example CURL Request for synthesis: ... [Provide example CURL request] ...

Text to Speech¶

__init__(input, output, state, **kwargs) ¶

initialize_pipeline() ¶

synthesize() ¶

tts_pipeline(**kwargs) ¶

`init(input, output, state, **kwargs)` ¶

`initialize_pipeline()` ¶

`synthesize()` ¶

`tts_pipeline(**kwargs)` ¶