promptbench Introduction¶
PromptBench is a unified library for evaluating and understanding large language models.
What does promptbench currently contain?¶
Quick access your model performance: We provide a user-friendly interface for quick build models, load dataset, and evaluate model performance.
Prompt Engineering:
Evaluating adversarial prompts: promptbench integrated prompt attacks [1] for researchers simulate black-box adversarial prompt attacks on the models and evaluate their performances.
Dynamic evaluation to mitigate potential test data contamination: we integrated the dynamic evaluation framework DyVal [2], which generates evaluation samples on-the-fly with controlled complexity.
Where should I get started?¶
If you want to
evaluate my model on existing benchmarks: please refer to the
examples/basic.ipynb
for constructing your evaluation pipeline. For a multi-modal evaluation pipeline, please refer toexamples/multimodal.ipynb
.test the effects of different prompting techniques:
examine the robustness for prompt attacks, please refer to
examples/prompt_attack.ipynb
to construct the attacks.use DyVal for evaluation: please refer to
examples/dyval.ipynb
to construct DyVal datasets.