Skip to content

Vision Base

Bases: VisionBulk

The VisionAPI class inherits from VisionBulk and is designed to facilitate the handling of vision-based tasks using a pre-trained machine learning model. It sets up a server to process image-related requests using a specified model.

__init__(input, output, state)

Initializes the VisionAPI object with batch input, output, and state.

Parameters:

Name Type Description Default
input BatchInput

Object to handle batch input operations.

required
output BatchOutput

Object to handle batch output operations.

required
state State

Object to maintain the state of the API.

required

listen(model_name, model_class='AutoModel', processor_class='AutoProcessor', device_map='auto', max_memory={0: '24GB'}, use_cuda=False, precision='float16', quantization=0, torchscript=False, compile=False, flash_attention=False, better_transformers=False, concurrent_queries=False, use_llama_cpp=False, llama_cpp_filename=None, llama_cpp_n_gpu_layers=0, llama_cpp_split_mode=llama_cpp.LLAMA_SPLIT_LAYER, llama_cpp_tensor_split=None, llama_cpp_vocab_only=False, llama_cpp_use_mmap=True, llama_cpp_use_mlock=False, llama_cpp_kv_overrides=None, llama_cpp_seed=llama_cpp.LLAMA_DEFAULT_SEED, llama_cpp_n_ctx=2048, llama_cpp_n_batch=512, llama_cpp_n_threads=None, llama_cpp_n_threads_batch=None, llama_cpp_rope_scaling_type=llama_cpp.LLAMA_ROPE_SCALING_UNSPECIFIED, llama_cpp_rope_freq_base=0.0, llama_cpp_rope_freq_scale=0.0, llama_cpp_yarn_ext_factor=-1.0, llama_cpp_yarn_attn_factor=1.0, llama_cpp_yarn_beta_fast=32.0, llama_cpp_yarn_beta_slow=1.0, llama_cpp_yarn_orig_ctx=0, llama_cpp_mul_mat_q=True, llama_cpp_logits_all=False, llama_cpp_embedding=False, llama_cpp_offload_kqv=True, llama_cpp_last_n_tokens_size=64, llama_cpp_lora_base=None, llama_cpp_lora_scale=1.0, llama_cpp_lora_path=None, llama_cpp_numa=False, llama_cpp_chat_format=None, llama_cpp_draft_model=None, endpoint='*', port=3000, cors_domain='http://localhost:3000', username=None, password=None, **model_args)

Configures and starts a CherryPy server to listen for image processing requests.

Parameters:

Name Type Description Default
model_name str

The name of the pre-trained vision model.

required
model_class str

The class of the pre-trained vision model. Defaults to "AutoModel".

'AutoModel'
processor_class str

The class of the processor for input image preprocessing. Defaults to "AutoProcessor".

'AutoProcessor'
device_map str | Dict | None

Device mapping for model inference. Defaults to "auto".

'auto'
max_memory Dict[int, str]

Maximum memory allocation for model inference. Defaults to {0: "24GB"}.

{0: '24GB'}
precision str

The floating-point precision to be used by the model. Options are 'float32', 'float16', 'bfloat16'.

'float16'
quantization int

The bit level for model quantization (0 for none, 8 for 8-bit quantization).

0
torchscript bool

Whether to use TorchScript for model optimization. Defaults to True.

False
compile bool

Whether to compile the model before fine-tuning. Defaults to False.

False
flash_attention bool

Whether to use flash attention 2. Default is False.

False
better_transformers bool

Flag to enable Better Transformers optimization for faster processing.

False
concurrent_queries bool

(bool): Whether the API supports concurrent API calls (usually false).

False
use_llama_cpp bool

Flag to use llama.cpp integration for language model inference.

False
llama_cpp_filename Optional[str]

The filename of the model file for llama.cpp.

None
llama_cpp_n_gpu_layers int

Number of layers to offload to GPU in llama.cpp configuration.

0
llama_cpp_split_mode int

Defines how the model is split across multiple GPUs in llama.cpp.

llama_cpp.LLAMA_SPLIT_LAYER
llama_cpp_tensor_split Optional[List[float]]

Custom tensor split configuration for llama.cpp.

None
llama_cpp_vocab_only bool

Loads only the vocabulary part of the model in llama.cpp.

False
llama_cpp_use_mmap bool

Enables memory-mapped files for model loading in llama.cpp.

True
llama_cpp_use_mlock bool

Locks the model in RAM to prevent swapping in llama.cpp.

False
llama_cpp_kv_overrides Optional[Dict[str, Union[bool, int, float]]]

Key-value pairs for overriding default llama.cpp model parameters.

None
llama_cpp_seed int

Seed for random number generation in llama.cpp.

llama_cpp.LLAMA_DEFAULT_SEED
llama_cpp_n_ctx int

The number of context tokens for the model in llama.cpp.

2048
llama_cpp_n_batch int

Batch size for processing prompts in llama.cpp.

512
llama_cpp_n_threads Optional[int]

Number of threads for generation in llama.cpp.

None
llama_cpp_n_threads_batch Optional[int]

Number of threads for batch processing in llama.cpp.

None
llama_cpp_rope_scaling_type Optional[int]

Specifies the RoPE (Rotary Positional Embeddings) scaling type in llama.cpp.

llama_cpp.LLAMA_ROPE_SCALING_UNSPECIFIED
llama_cpp_rope_freq_base float

Base frequency for RoPE in llama.cpp.

0.0
llama_cpp_rope_freq_scale float

Frequency scaling factor for RoPE in llama.cpp.

0.0
llama_cpp_yarn_ext_factor float

Extrapolation mix factor for YaRN in llama.cpp.

-1.0
llama_cpp_yarn_attn_factor float

Attention factor for YaRN in llama.cpp.

1.0
llama_cpp_yarn_beta_fast float

Beta fast parameter for YaRN in llama.cpp.

32.0
llama_cpp_yarn_beta_slow float

Beta slow parameter for YaRN in llama.cpp.

1.0
llama_cpp_yarn_orig_ctx int

Original context size for YaRN in llama.cpp.

0
llama_cpp_mul_mat_q bool

Flag to enable matrix multiplication for queries in llama.cpp.

True
llama_cpp_logits_all bool

Returns logits for all tokens when set to True in llama.cpp.

False
llama_cpp_embedding bool

Enables embedding mode only in llama.cpp.

False
llama_cpp_offload_kqv bool

Offloads K, Q, V matrices to GPU in llama.cpp.

True
llama_cpp_last_n_tokens_size int

Size for the last_n_tokens buffer in llama.cpp.

64
llama_cpp_lora_base Optional[str]

Base model path for LoRA adjustments in llama.cpp.

None
llama_cpp_lora_scale float

Scale factor for LoRA adjustments in llama.cpp.

1.0
llama_cpp_lora_path Optional[str]

Path to LoRA adjustments file in llama.cpp.

None
llama_cpp_numa Union[bool, int]

NUMA configuration for llama.cpp.

False
llama_cpp_chat_format Optional[str]

Specifies the chat format for llama.cpp.

None
llama_cpp_draft_model Optional[llama_cpp.LlamaDraftModel]

Draft model for speculative decoding in llama.cpp.

None
endpoint str

The network endpoint for the server. Defaults to "*".

'*'
port int

The network port for the server. Defaults to 3000.

3000
cors_domain str

The domain to allow for CORS requests. Defaults to "http://localhost:3000".

'http://localhost:3000'
username Optional[str]

Username for server authentication. Defaults to None.

None
password Optional[str]

Password for server authentication. Defaults to None.

None
**model_args Any

Additional arguments for the vision model.

{}

validate_password(realm, username, password)

Validate the username and password against expected values.

Parameters:

Name Type Description Default
realm str

The authentication realm.

required
username str

The provided username.

required
password str

The provided password.

required

Returns:

Name Type Description
bool

True if credentials are valid, False otherwise.