Vision Base¶
Bases: VisionBulk
The VisionAPI class inherits from VisionBulk and is designed to facilitate the handling of vision-based tasks using a pre-trained machine learning model. It sets up a server to process image-related requests using a specified model.
__init__(input, output, state)
¶
Initializes the VisionAPI object with batch input, output, and state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input |
BatchInput
|
Object to handle batch input operations. |
required |
output |
BatchOutput
|
Object to handle batch output operations. |
required |
state |
State
|
Object to maintain the state of the API. |
required |
listen(model_name, model_class='AutoModel', processor_class='AutoProcessor', device_map='auto', max_memory={0: '24GB'}, use_cuda=False, precision='float16', quantization=0, torchscript=False, compile=False, flash_attention=False, better_transformers=False, concurrent_queries=False, use_llama_cpp=False, llama_cpp_filename=None, llama_cpp_n_gpu_layers=0, llama_cpp_split_mode=llama_cpp.LLAMA_SPLIT_LAYER, llama_cpp_tensor_split=None, llama_cpp_vocab_only=False, llama_cpp_use_mmap=True, llama_cpp_use_mlock=False, llama_cpp_kv_overrides=None, llama_cpp_seed=llama_cpp.LLAMA_DEFAULT_SEED, llama_cpp_n_ctx=2048, llama_cpp_n_batch=512, llama_cpp_n_threads=None, llama_cpp_n_threads_batch=None, llama_cpp_rope_scaling_type=llama_cpp.LLAMA_ROPE_SCALING_UNSPECIFIED, llama_cpp_rope_freq_base=0.0, llama_cpp_rope_freq_scale=0.0, llama_cpp_yarn_ext_factor=-1.0, llama_cpp_yarn_attn_factor=1.0, llama_cpp_yarn_beta_fast=32.0, llama_cpp_yarn_beta_slow=1.0, llama_cpp_yarn_orig_ctx=0, llama_cpp_mul_mat_q=True, llama_cpp_logits_all=False, llama_cpp_embedding=False, llama_cpp_offload_kqv=True, llama_cpp_last_n_tokens_size=64, llama_cpp_lora_base=None, llama_cpp_lora_scale=1.0, llama_cpp_lora_path=None, llama_cpp_numa=False, llama_cpp_chat_format=None, llama_cpp_draft_model=None, endpoint='*', port=3000, cors_domain='http://localhost:3000', username=None, password=None, **model_args)
¶
Configures and starts a CherryPy server to listen for image processing requests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
str
|
The name of the pre-trained vision model. |
required |
model_class |
str
|
The class of the pre-trained vision model. Defaults to "AutoModel". |
'AutoModel'
|
processor_class |
str
|
The class of the processor for input image preprocessing. Defaults to "AutoProcessor". |
'AutoProcessor'
|
device_map |
str | Dict | None
|
Device mapping for model inference. Defaults to "auto". |
'auto'
|
max_memory |
Dict[int, str]
|
Maximum memory allocation for model inference. Defaults to {0: "24GB"}. |
{0: '24GB'}
|
precision |
str
|
The floating-point precision to be used by the model. Options are 'float32', 'float16', 'bfloat16'. |
'float16'
|
quantization |
int
|
The bit level for model quantization (0 for none, 8 for 8-bit quantization). |
0
|
torchscript |
bool
|
Whether to use TorchScript for model optimization. Defaults to True. |
False
|
compile |
bool
|
Whether to compile the model before fine-tuning. Defaults to False. |
False
|
flash_attention |
bool
|
Whether to use flash attention 2. Default is False. |
False
|
better_transformers |
bool
|
Flag to enable Better Transformers optimization for faster processing. |
False
|
concurrent_queries |
bool
|
(bool): Whether the API supports concurrent API calls (usually false). |
False
|
use_llama_cpp |
bool
|
Flag to use llama.cpp integration for language model inference. |
False
|
llama_cpp_filename |
Optional[str]
|
The filename of the model file for llama.cpp. |
None
|
llama_cpp_n_gpu_layers |
int
|
Number of layers to offload to GPU in llama.cpp configuration. |
0
|
llama_cpp_split_mode |
int
|
Defines how the model is split across multiple GPUs in llama.cpp. |
llama_cpp.LLAMA_SPLIT_LAYER
|
llama_cpp_tensor_split |
Optional[List[float]]
|
Custom tensor split configuration for llama.cpp. |
None
|
llama_cpp_vocab_only |
bool
|
Loads only the vocabulary part of the model in llama.cpp. |
False
|
llama_cpp_use_mmap |
bool
|
Enables memory-mapped files for model loading in llama.cpp. |
True
|
llama_cpp_use_mlock |
bool
|
Locks the model in RAM to prevent swapping in llama.cpp. |
False
|
llama_cpp_kv_overrides |
Optional[Dict[str, Union[bool, int, float]]]
|
Key-value pairs for overriding default llama.cpp model parameters. |
None
|
llama_cpp_seed |
int
|
Seed for random number generation in llama.cpp. |
llama_cpp.LLAMA_DEFAULT_SEED
|
llama_cpp_n_ctx |
int
|
The number of context tokens for the model in llama.cpp. |
2048
|
llama_cpp_n_batch |
int
|
Batch size for processing prompts in llama.cpp. |
512
|
llama_cpp_n_threads |
Optional[int]
|
Number of threads for generation in llama.cpp. |
None
|
llama_cpp_n_threads_batch |
Optional[int]
|
Number of threads for batch processing in llama.cpp. |
None
|
llama_cpp_rope_scaling_type |
Optional[int]
|
Specifies the RoPE (Rotary Positional Embeddings) scaling type in llama.cpp. |
llama_cpp.LLAMA_ROPE_SCALING_UNSPECIFIED
|
llama_cpp_rope_freq_base |
float
|
Base frequency for RoPE in llama.cpp. |
0.0
|
llama_cpp_rope_freq_scale |
float
|
Frequency scaling factor for RoPE in llama.cpp. |
0.0
|
llama_cpp_yarn_ext_factor |
float
|
Extrapolation mix factor for YaRN in llama.cpp. |
-1.0
|
llama_cpp_yarn_attn_factor |
float
|
Attention factor for YaRN in llama.cpp. |
1.0
|
llama_cpp_yarn_beta_fast |
float
|
Beta fast parameter for YaRN in llama.cpp. |
32.0
|
llama_cpp_yarn_beta_slow |
float
|
Beta slow parameter for YaRN in llama.cpp. |
1.0
|
llama_cpp_yarn_orig_ctx |
int
|
Original context size for YaRN in llama.cpp. |
0
|
llama_cpp_mul_mat_q |
bool
|
Flag to enable matrix multiplication for queries in llama.cpp. |
True
|
llama_cpp_logits_all |
bool
|
Returns logits for all tokens when set to True in llama.cpp. |
False
|
llama_cpp_embedding |
bool
|
Enables embedding mode only in llama.cpp. |
False
|
llama_cpp_offload_kqv |
bool
|
Offloads K, Q, V matrices to GPU in llama.cpp. |
True
|
llama_cpp_last_n_tokens_size |
int
|
Size for the last_n_tokens buffer in llama.cpp. |
64
|
llama_cpp_lora_base |
Optional[str]
|
Base model path for LoRA adjustments in llama.cpp. |
None
|
llama_cpp_lora_scale |
float
|
Scale factor for LoRA adjustments in llama.cpp. |
1.0
|
llama_cpp_lora_path |
Optional[str]
|
Path to LoRA adjustments file in llama.cpp. |
None
|
llama_cpp_numa |
Union[bool, int]
|
NUMA configuration for llama.cpp. |
False
|
llama_cpp_chat_format |
Optional[str]
|
Specifies the chat format for llama.cpp. |
None
|
llama_cpp_draft_model |
Optional[llama_cpp.LlamaDraftModel]
|
Draft model for speculative decoding in llama.cpp. |
None
|
endpoint |
str
|
The network endpoint for the server. Defaults to "*". |
'*'
|
port |
int
|
The network port for the server. Defaults to 3000. |
3000
|
cors_domain |
str
|
The domain to allow for CORS requests. Defaults to "http://localhost:3000". |
'http://localhost:3000'
|
username |
Optional[str]
|
Username for server authentication. Defaults to None. |
None
|
password |
Optional[str]
|
Password for server authentication. Defaults to None. |
None
|
**model_args |
Any
|
Additional arguments for the vision model. |
{}
|
validate_password(realm, username, password)
¶
Validate the username and password against expected values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
realm |
str
|
The authentication realm. |
required |
username |
str
|
The provided username. |
required |
password |
str
|
The provided password. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
True if credentials are valid, False otherwise. |