metrics.eval¶
- class promptbench.metrics.eval.Eval¶
Bases:
objectA utility class for computing various evaluation metrics.
This class provides static methods to compute metrics such as classification accuracy, SQuAD V2 F1 score, BLEU score, and math accuracy.
Methods:¶
- compute_cls_accuracy(preds, gts)
Computes classification accuracy.
- compute_squad_v2_f1(preds, gts, dataset)
Computes the F1 score for the SQuAD V2 dataset.
- compute_bleu(preds, gts)
Computes the BLEU score for translation tasks.
- compute_math_accuracy(dataset, preds, gts)
Computes accuracy for math dataset.
- static compute_bleu(preds, gts)¶
Computes the BLEU score for translation tasks.
Parameters:¶
- predslist
A list of predictions.
- gtslist
A list of ground truth translations.
Returns:¶
: float
The BLEU score.
- static compute_cider(preds, gts)¶
Computes the CIDEr score for image captioning tasks.
Parameters:¶
- predslist
A list of predictions.
- gtslist
A list of ground truth captions.
Returns:¶
: float
The CIDEr score.
- static compute_cls_accuracy(preds, gts)¶
Computes classification accuracy based on predictions and ground truths.
Parameters:¶
- predslist
A list of predictions.
- gtslist
A list of ground truths.
Returns:¶
: float
The classification accuracy.
- static compute_math_accuracy(preds, gts)¶
Computes accuracy for the ‘math’ dataset.
Parameters:¶
- datasetlist
The dataset containing math data.
- predslist
A list of predictions.
- gtslist
A list of ground truths.
Returns:¶
: float
The math accuracy.