models.models

class promptbench.models.models.BLIP2Model(model_name, max_new_tokens, temperature, device, dtype)

Bases: VLMBaseModel

Vision Language model class for the BLIP2 model.

Inherits from VLMBaseModel and sets up the BLIP2 vision language model for use.

Parameters:

modelstr

The name of the BLIP2 model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat, optional

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

Parameters of predict method:

input_images: list of PIL.Image

The input images.

input_text: str

The input text.

class promptbench.models.models.BaichuanModel(model_name, max_new_tokens, temperature, device, dtype)

Bases: LMMBaseModel

Language model class for the Baichuan model.

Inherits from LMMBaseModel and sets up the Baichuan language model for use.

Parameters:

modelstr

The name of the Baichuan model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat, optional

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

Methods:

predict(input_text, **kwargs)

Generates a prediction based on the input text.

class promptbench.models.models.GeminiModel(model, max_new_tokens, temperature=0, gemini_key=None)

Bases: LMMBaseModel

Language model class for interfacing with Google’s Gemini models.

Inherits from LMMBaseModel and sets up a model interface for Gemini models.

Parameters:

modelstr

The name of the PaLM model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat, optional

The temperature for text generation (default is 0).

gemini_keystr, optional

The Gemini API key (default is None).

predict(input_text, **kwargs)
class promptbench.models.models.GeminiVisionModel(model, max_new_tokens, temperature, gemini_key=None)

Bases: VLMBaseModel

Vision Language model class for interfacing with Google’s Gemini models.

Inherits from VLMBaseModel and sets up a model interface for Gemini models.

Parameters:

modelstr

The name of the PaLM model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat, optional

The temperature for text generation (default is 0).

gemini_keystr, optional

The Gemini API key (default is None).

Parameters of predict method:

input_image: list of PIL.Image

The input images.

input_text: str

The input text.

predict(input_images, input_text, **kwargs)
class promptbench.models.models.InternLMVisionModel(model_name, max_new_tokens, temperature, device, dtype)

Bases: VLMBaseModel

Vision Language model class for interfacing with InternLM’s vision language models.

Inherits from VLMBaseModel and sets up a model interface for InternLM’s vision language models.

Parameters:

model_namestr

The name of the InternLM model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat, optional

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

Parameters of predict method:

input_image: list of str

The url / local path of the input images.

input_text: str

The input text.

predict(input_images, input_text, **kwargs)
class promptbench.models.models.LLaVAModel(model_name, max_new_tokens, temperature, device, dtype)

Bases: VLMBaseModel

Vision Language model class for the LLaVA model.

Inherits from VLMBaseModel and sets up the LLaVA vision language model for use.

Parameters:

modelstr

The name of the LLaVA model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

Parameters of predict method:

input_image: list of PIL.Image

The input images.

input_text: str

The input text. Using <image> as the placeholder for the image.

class promptbench.models.models.LMMBaseModel(model_name, max_new_tokens, temperature, device='auto')

Bases: ABC

Abstract base class for language model interfaces.

This class provides a common interface for various language models and includes methods for prediction.

Parameters:

modelstr

The name of the language model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

Methods:

predict(input_text, **kwargs)

Generates a prediction based on the input text.

__call__(input_text, **kwargs)

Shortcut for predict method.

predict(input_text, **kwargs)
class promptbench.models.models.LlamaModel(model_name, max_new_tokens, temperature, device, dtype, system_prompt, model_dir)

Bases: LMMBaseModel

Language model class for the Llama model.

Inherits from LMMBaseModel and sets up the Llama language model for use.

Parameters:

modelstr

The name of the Llama model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

system_promptstr

The system prompt to be used (default is None).

model_dirstr

The directory containing the model files (default is None). If not provided, it will be downloaded from the HuggingFace model hub.

predict(input_text, **kwargs)
class promptbench.models.models.MistralModel(model_name, max_new_tokens, temperature, device, dtype)

Bases: LMMBaseModel

Language model class for the Mistral model.

Inherits from LMMBaseModel and sets up the Mistral language model for use.

Parameters:

modelstr

The name of the Mistral model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

class promptbench.models.models.MixtralModel(model_name, max_new_tokens, temperature, device, dtype)

Bases: LMMBaseModel

Language model class for the Mixtral model.

Inherits from LMMBaseModel and sets up the Mixtral language model for use.

Parameters:

modelstr

The name of the Mixtral model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

class promptbench.models.models.OpenAIModel(model_name, max_new_tokens, temperature, system_prompt, openai_key)

Bases: LMMBaseModel

Language model class for interfacing with OpenAI’s GPT models.

Inherits from LMMBaseModel and sets up a model interface for OpenAI GPT models.

Parameters:

modelstr

The name of the OpenAI model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

system_promptstr

The system prompt to be used (default is None).

openai_keystr

The OpenAI API key (default is None).

Methods:

predict(input_text)

Predicts the output based on the given input text using the OpenAI model.

predict(input_text, **kwargs)
class promptbench.models.models.OpenAIVisionModel(model_name, max_new_tokens, temperature, system_prompt, openai_key)

Bases: VLMBaseModel

Vision Language model class for interfacing with OpenAI’s GPT models.

Inherits from VLMBaseModel and sets up a model interface for OpenAI GPT models.

Parameters:

modelstr

The name of the OpenAI model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

system_promptstr

The system prompt to be used (default is None).

openai_keystr

The OpenAI API key (default is None).

Parameters of predict method:

input_image: list of str

The url / local path of the input images.

input_text: str

The input text.

predict(input_images, input_text, **kwargs)
class promptbench.models.models.PaLMModel(model, max_new_tokens, temperature=0, api_key=None)

Bases: LMMBaseModel

Language model class for interfacing with PaLM models.

Inherits from LMMBaseModel and sets up a model interface for PaLM models.

Parameters:

modelstr

The name of the PaLM model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat, optional

The temperature for text generation (default is 0).

api_keystr, optional

The PaLM API key (default is None).

predict(input_text, **kwargs)
class promptbench.models.models.PhiModel(model_name, max_new_tokens, temperature, device, dtype)

Bases: LMMBaseModel

Language model class for the Phi model.

Inherits from LMMBaseModel and sets up the Phi language model for use.

Parameters:

modelstr

The name of the Phi model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

predict(input_text, **kwargs)
class promptbench.models.models.QwenVLModel(model_name, max_new_tokens, temperature, device, dtype, system_prompt, api_key)

Bases: VLMBaseModel

Vision Language model class for the Qwen model.

Inherits from VLMBaseModel and sets up the Qwen vision language model for use.

Parameters:

modelstr

The name of the Qwen model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

system_promptstr

The system prompt to be used (default is None).

api_keystr

The api key for the Qwen model (default is None).

Parameters of predict method:

input_image: list of str

The url / local path of the input images. (Add “file://” prefix for local path when using ‘qwen-vl-plus’ and ‘qwen-vl-max’)

input_text: str

The input text.

predict(input_images, input_text, **kwargs)
class promptbench.models.models.T5Model(model_name, max_new_tokens, temperature, device, dtype)

Bases: LMMBaseModel

Language model class for the T5 model.

Inherits from LMMBaseModel and sets up the T5 language model for use.

Parameters:

modelstr

The name of the T5 model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

class promptbench.models.models.UL2Model(model_name, max_new_tokens, temperature, device, dtype)

Bases: LMMBaseModel

Language model class for the UL2 model.

Inherits from LMMBaseModel and sets up the UL2 language model for use.

Parameters:

modelstr

The name of the UL2 model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

class promptbench.models.models.VLMBaseModel(model_name, max_new_tokens, temperature, device='auto')

Bases: ABC

Abstract base class for vision language model interfaces.

This class provides a common interface for various vision language models and includes methods for prediction.

Parameters:

modelstr

The name of the vision language model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

Methods:

predict(input_images, input_text, **kwargs)

Generates a prediction based on the input images and text.

__call__(input_image, input_text, **kwargs)

Shortcut for predict method.

predict(input_images, input_text, **kwargs)
class promptbench.models.models.VicunaModel(model_name, max_new_tokens, temperature, device, dtype, model_dir)

Bases: LMMBaseModel

Language model class for the Vicuna model.

Inherits from LMMBaseModel and sets up the Vicuna language model for use.

Parameters:

modelstr

The name of the Vicuna model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat, optional

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).

dtype: str

The dtype to use for inference (default is ‘auto’).

model_dirstr, optional

The directory containing the model files (default is None).

predict(input_text, **kwargs)
class promptbench.models.models.YiModel(model_name, max_new_tokens, temperature, device, dtype)

Bases: LMMBaseModel

Language model class for the Yi model.

Inherits from LMMBaseModel and sets up the Yi language model for use.

Parameters:

modelstr

The name of the Yi model.

max_new_tokensint

The maximum number of new tokens to be generated.

temperaturefloat

The temperature for text generation (default is 0).

device: str

The device to use for inference (default is ‘auto’).