Text to Speech¶
Bases: AudioAPI
TextToSpeechAPI for converting text to speech using various TTS models.
Attributes:
Name | Type | Description |
---|---|---|
model |
AutoModelForSeq2SeqLM
|
The text-to-speech model. |
tokenizer |
AutoTokenizer
|
The tokenizer for the model. |
Methods
synthesize(text_input: str) -> bytes: Converts the given text input to speech using the text-to-speech model.
Example CLI Usage:
genius TextToSpeechAPI rise \
batch \
--input_folder ./input \
batch \
--output_folder ./output \
none \
--id facebook/mms-tts-eng \
listen \
--args \
model_name="facebook/mms-tts-eng" \
model_class="VitsModel" \
processor_class="VitsTokenizer" \
use_cuda=True \
precision="float32" \
quantization=0 \
device_map="cuda:0" \
max_memory=None \
torchscript=False \
compile=False \
endpoint="*" \
port=3000 \
cors_domain="http://localhost:3000" \
username="user" \
password="password"
__init__(input, output, state, **kwargs)
¶
Initializes the TextToSpeechAPI with configurations for text-to-speech processing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
The input data configuration. |
required |
output |
BatchOutput
|
The output data configuration. |
required |
state |
State
|
The state configuration. |
required |
**kwargs |
Additional keyword arguments. |
{}
|
initialize_pipeline()
¶
Lazy initialization of the TTS Hugging Face pipeline.
synthesize()
¶
API endpoint to convert text input to speech using the text-to-speech model. Expects a JSON input with 'text' as a key containing the text to be synthesized.
Returns:
Type | Description |
---|---|
Dict[str, str]: A dictionary containing the base64 encoded audio data. |
Example CURL Request for synthesis: ... [Provide example CURL request] ...
tts_pipeline(**kwargs)
¶
Converts text to speech using the Hugging Face pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs |
Any
|
Arbitrary keyword arguments, typically containing 'text' for the input text. |
{}
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the base64 encoded audio data. |
Example CURL Request for synthesis: ... [Provide example CURL request] ...