Vision Base¶

Bases: VisionBulk

The VisionAPI class inherits from VisionBulk and is designed to facilitate the handling of vision-based tasks using a pre-trained machine learning model. It sets up a server to process image-related requests using a specified model.

`init(input, output, state)` ¶

Initializes the VisionAPI object with batch input, output, and state.

Parameters:

Name	Type	Description	Default
`input`	`BatchInput`	Object to handle batch input operations.	required
`output`	`BatchOutput`	Object to handle batch output operations.	required
`state`	`State`	Object to maintain the state of the API.	required

listen(model_name, model_class='AutoModel', processor_class='AutoProcessor', device_map='auto', max_memory={0: '24GB'}, use_cuda=False, precision='float16', quantization=0, torchscript=False, compile=False, flash_attention=False, better_transformers=False, concurrent_queries=False, use_llama_cpp=False, llama_cpp_filename=None, llama_cpp_n_gpu_layers=0, llama_cpp_split_mode=llama_cpp.LLAMA_SPLIT_LAYER, llama_cpp_tensor_split=None, llama_cpp_vocab_only=False, llama_cpp_use_mmap=True, llama_cpp_use_mlock=False, llama_cpp_kv_overrides=None, llama_cpp_seed=llama_cpp.LLAMA_DEFAULT_SEED, llama_cpp_n_ctx=2048, llama_cpp_n_batch=512, llama_cpp_n_threads=None, llama_cpp_n_threads_batch=None, llama_cpp_rope_scaling_type=llama_cpp.LLAMA_ROPE_SCALING_UNSPECIFIED, llama_cpp_rope_freq_base=0.0, llama_cpp_rope_freq_scale=0.0, llama_cpp_yarn_ext_factor=-1.0, llama_cpp_yarn_attn_factor=1.0, llama_cpp_yarn_beta_fast=32.0, llama_cpp_yarn_beta_slow=1.0, llama_cpp_yarn_orig_ctx=0, llama_cpp_mul_mat_q=True, llama_cpp_logits_all=False, llama_cpp_embedding=False, llama_cpp_offload_kqv=True, llama_cpp_last_n_tokens_size=64, llama_cpp_lora_base=None, llama_cpp_lora_scale=1.0, llama_cpp_lora_path=None, llama_cpp_numa=False, llama_cpp_chat_format=None, llama_cpp_draft_model=None, endpoint='*', port=3000, cors_domain='http://localhost:3000', username=None, password=None, **model_args) ¶

Configures and starts a CherryPy server to listen for image processing requests.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the pre-trained vision model.	required
`model_class`	`str`	The class of the pre-trained vision model. Defaults to "AutoModel".	`'AutoModel'`
`processor_class`	`str`	The class of the processor for input image preprocessing. Defaults to "AutoProcessor".	`'AutoProcessor'`
`device_map`	`str \| Dict \| None`	Device mapping for model inference. Defaults to "auto".	`'auto'`
`max_memory`	`Dict[int, str]`	Maximum memory allocation for model inference. Defaults to {0: "24GB"}.	`{0: '24GB'}`
`precision`	`str`	The floating-point precision to be used by the model. Options are 'float32', 'float16', 'bfloat16'.	`'float16'`
`quantization`	`int`	The bit level for model quantization (0 for none, 8 for 8-bit quantization).	`0`
`torchscript`	`bool`	Whether to use TorchScript for model optimization. Defaults to True.	`False`
`compile`	`bool`	Whether to compile the model before fine-tuning. Defaults to False.	`False`
`flash_attention`	`bool`	Whether to use flash attention 2. Default is False.	`False`
`better_transformers`	`bool`	Flag to enable Better Transformers optimization for faster processing.	`False`
`concurrent_queries`	`bool`	(bool): Whether the API supports concurrent API calls (usually false).	`False`
`use_llama_cpp`	`bool`	Flag to use llama.cpp integration for language model inference.	`False`
`llama_cpp_filename`	`Optional[str]`	The filename of the model file for llama.cpp.	`None`
`llama_cpp_n_gpu_layers`	`int`	Number of layers to offload to GPU in llama.cpp configuration.	`0`
`llama_cpp_split_mode`	`int`	Defines how the model is split across multiple GPUs in llama.cpp.	`llama_cpp.LLAMA_SPLIT_LAYER`
`llama_cpp_tensor_split`	`Optional[List[float]]`	Custom tensor split configuration for llama.cpp.	`None`
`llama_cpp_vocab_only`	`bool`	Loads only the vocabulary part of the model in llama.cpp.	`False`
`llama_cpp_use_mmap`	`bool`	Enables memory-mapped files for model loading in llama.cpp.	`True`
`llama_cpp_use_mlock`	`bool`	Locks the model in RAM to prevent swapping in llama.cpp.	`False`
`llama_cpp_kv_overrides`	`Optional[Dict[str, Union[bool, int, float]]]`	Key-value pairs for overriding default llama.cpp model parameters.	`None`
`llama_cpp_seed`	`int`	Seed for random number generation in llama.cpp.	`llama_cpp.LLAMA_DEFAULT_SEED`
`llama_cpp_n_ctx`	`int`	The number of context tokens for the model in llama.cpp.	`2048`
`llama_cpp_n_batch`	`int`	Batch size for processing prompts in llama.cpp.	`512`
`llama_cpp_n_threads`	`Optional[int]`	Number of threads for generation in llama.cpp.	`None`
`llama_cpp_n_threads_batch`	`Optional[int]`	Number of threads for batch processing in llama.cpp.	`None`
`llama_cpp_rope_scaling_type`	`Optional[int]`	Specifies the RoPE (Rotary Positional Embeddings) scaling type in llama.cpp.	`llama_cpp.LLAMA_ROPE_SCALING_UNSPECIFIED`
`llama_cpp_rope_freq_base`	`float`	Base frequency for RoPE in llama.cpp.	`0.0`
`llama_cpp_rope_freq_scale`	`float`	Frequency scaling factor for RoPE in llama.cpp.	`0.0`
`llama_cpp_yarn_ext_factor`	`float`	Extrapolation mix factor for YaRN in llama.cpp.	`-1.0`
`llama_cpp_yarn_attn_factor`	`float`	Attention factor for YaRN in llama.cpp.	`1.0`
`llama_cpp_yarn_beta_fast`	`float`	Beta fast parameter for YaRN in llama.cpp.	`32.0`
`llama_cpp_yarn_beta_slow`	`float`	Beta slow parameter for YaRN in llama.cpp.	`1.0`
`llama_cpp_yarn_orig_ctx`	`int`	Original context size for YaRN in llama.cpp.	`0`
`llama_cpp_mul_mat_q`	`bool`	Flag to enable matrix multiplication for queries in llama.cpp.	`True`
`llama_cpp_logits_all`	`bool`	Returns logits for all tokens when set to True in llama.cpp.	`False`
`llama_cpp_embedding`	`bool`	Enables embedding mode only in llama.cpp.	`False`
`llama_cpp_offload_kqv`	`bool`	Offloads K, Q, V matrices to GPU in llama.cpp.	`True`
`llama_cpp_last_n_tokens_size`	`int`	Size for the last_n_tokens buffer in llama.cpp.	`64`
`llama_cpp_lora_base`	`Optional[str]`	Base model path for LoRA adjustments in llama.cpp.	`None`
`llama_cpp_lora_scale`	`float`	Scale factor for LoRA adjustments in llama.cpp.	`1.0`
`llama_cpp_lora_path`	`Optional[str]`	Path to LoRA adjustments file in llama.cpp.	`None`
`llama_cpp_numa`	`Union[bool, int]`	NUMA configuration for llama.cpp.	`False`
`llama_cpp_chat_format`	`Optional[str]`	Specifies the chat format for llama.cpp.	`None`
`llama_cpp_draft_model`	`Optional[llama_cpp.LlamaDraftModel]`	Draft model for speculative decoding in llama.cpp.	`None`
`endpoint`	`str`	The network endpoint for the server. Defaults to "*".	`'*'`
`port`	`int`	The network port for the server. Defaults to 3000.	`3000`
`cors_domain`	`str`	The domain to allow for CORS requests. Defaults to "http://localhost:3000".	`'http://localhost:3000'`
`username`	`Optional[str]`	Username for server authentication. Defaults to None.	`None`
`password`	`Optional[str]`	Password for server authentication. Defaults to None.	`None`
`**model_args`	`Any`	Additional arguments for the vision model.	`{}`

`validate_password(realm, username, password)` ¶

Validate the username and password against expected values.

Parameters:

Name	Type	Description	Default
`realm`	`str`	The authentication realm.	required
`username`	`str`	The provided username.	required
`password`	`str`	The provided password.	required

Returns:

Name	Type	Description
`bool`		True if credentials are valid, False otherwise.

Vision Base¶

__init__(input, output, state) ¶

validate_password(realm, username, password) ¶

`init(input, output, state)` ¶

`validate_password(realm, username, password)` ¶