Base API¶
Bases: AudioBulk
A class representing a Hugging Face API for generating text using a pre-trained language model.
Attributes:
Name | Type | Description |
---|---|---|
model |
Any
|
The pre-trained language model. |
processor |
Any
|
The processor used to preprocess input text. |
model_name |
str
|
The name of the pre-trained language model. |
model_revision |
Optional[str]
|
The revision of the pre-trained language model. |
processor_name |
str
|
The name of the processor used to preprocess input text. |
processor_revision |
Optional[str]
|
The revision of the processor used to preprocess input text. |
model_class |
str
|
The name of the class of the pre-trained language model. |
processor_class |
str
|
The name of the class of the processor used to preprocess input text. |
use_cuda |
bool
|
Whether to use a GPU for inference. |
quantization |
int
|
The level of quantization to use for the pre-trained language model. |
precision |
str
|
The precision to use for the pre-trained language model. |
device_map |
str | Dict | None
|
The mapping of devices to use for inference. |
max_memory |
Dict[int, str]
|
The maximum memory to use for inference. |
torchscript |
bool
|
Whether to use a TorchScript-optimized version of the pre-trained language model. |
model_args |
Any
|
Additional arguments to pass to the pre-trained language model. |
Methods
text(**kwargs: Any) -> Dict[str, Any]: Generates text based on the given prompt and decoding strategy.
listen(model_name: str, model_class: str = "AutoModelForCausalLM", processor_class: str = "AutoProcessor", use_cuda: bool = False, precision: str = "float16", quantization: int = 0, device_map: str | Dict | None = "auto", max_memory={0: "24GB"}, torchscript: bool = True, endpoint: str = "", port: int = 3000, cors_domain: str = "http://localhost:3000", username: Optional[str] = None, password: Optional[str] = None, *model_args: Any) -> None: Starts a CherryPy server to listen for requests to generate text.
__init__(input, output, state, **kwargs)
¶
Initializes a new instance of the TextAPI class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
The input data to process. |
required |
output |
BatchOutput
|
The output data to process. |
required |
state |
State
|
The state of the API. |
required |
listen(model_name, model_class='AutoModel', processor_class='AutoProcessor', use_cuda=False, precision='float16', quantization=0, device_map='auto', max_memory={0: '24GB'}, torchscript=False, compile=False, concurrent_queries=False, use_whisper_cpp=False, use_faster_whisper=False, endpoint='*', port=3000, cors_domain='http://localhost:3000', username=None, password=None, **model_args)
¶
Starts a CherryPy server to listen for requests to generate text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
The name of the pre-trained language model. |
required |
model_class |
str
|
The name of the class of the pre-trained language model. Defaults to "AutoModelForCausalLM". |
'AutoModel'
|
processor_class |
str
|
The name of the class of the processor used to preprocess input text. Defaults to "AutoProcessor". |
'AutoProcessor'
|
use_cuda |
bool
|
Whether to use a GPU for inference. Defaults to False. |
False
|
precision |
str
|
The precision to use for the pre-trained language model. Defaults to "float16". |
'float16'
|
quantization |
int
|
The level of quantization to use for the pre-trained language model. Defaults to 0. |
0
|
device_map |
str | Dict | None
|
The mapping of devices to use for inference. Defaults to "auto". |
'auto'
|
max_memory |
Dict[int, str]
|
The maximum memory to use for inference. Defaults to {0: "24GB"}. |
{0: '24GB'}
|
torchscript |
bool
|
Whether to use a TorchScript-optimized version of the pre-trained language model. Defaults to True. |
False
|
compile |
bool
|
Enable Torch JIT compilation. |
False
|
concurrent_queries |
bool
|
(bool): Whether the API supports concurrent API calls (usually false). |
False
|
use_whisper_cpp |
bool
|
Whether to use whisper.cpp to load the model. Defaults to False. Note: only works for these models: https://github.com/aarnphm/whispercpp/blob/524dd6f34e9d18137085fb92a42f1c31c9c6bc29/src/whispercpp/utils.py#L32 |
False
|
use_faster_whisper |
bool
|
Whether to use faster-whisper. |
False
|
endpoint |
str
|
The endpoint to listen on. Defaults to "*". |
'*'
|
port |
int
|
The port to listen on. Defaults to 3000. |
3000
|
cors_domain |
str
|
The domain to allow CORS requests from. Defaults to "http://localhost:3000". |
'http://localhost:3000'
|
username |
Optional[str]
|
The username to use for authentication. Defaults to None. |
None
|
password |
Optional[str]
|
The password to use for authentication. Defaults to None. |
None
|
**model_args |
Any
|
Additional arguments to pass to the pre-trained language model. |
{}
|
validate_password(realm, username, password)
¶
Validate the username and password against expected values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
realm |
str
|
The authentication realm. |
required |
username |
str
|
The provided username. |
required |
password |
str
|
The provided password. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
True if credentials are valid, False otherwise. |