hf_evaluate_packages

class aitoolbox.experiment.result_package.hf_evaluate_packages.HFEvaluateResultPackage(hf_evaluate_metric, use_models_additional_results=True, **kwargs)[source]

Bases: AbstractResultPackage

HuggingFace Evaluate Metrics Result Package

Result package wrapping around the evaluation metrics provided in the HuggingFace Evaluate package.

All the metric result names will have the ‘_HFEvaluate’ appended at the end to help distinguish them.

Github: https://github.com/huggingface/evaluate

More info on how to use the metrics: https://huggingface.co/docs/evaluate/index

Parameters:

hf_evaluate_metric (evaluate.EvaluationModule) – HF Evaluate metric to be used by the result package
use_models_additional_results (bool) – Should the additional results from the model (in addition to predictions and references) normally returned from the get_predictions() function be added as the additional input to the HF Evaluate metric to perform the evaluation calculation.
**kwargs – additional parameters or inputs to the HF Evaluate metric being calculated. These can be generally inputs available already at the start before making model predictions and thus don’t need to be gathered from the train/prediction loop.

prepare_results_dict()[source]

Perform result package building

Mostly this consists of executing calculation of selected performance metrics and returning their result dicts. If you want to use multiple performance metrics you have to combine them in the single self.results_dict at the end by doing this:

return {**metric_dict_1, **metric_dict_2}

Returns:: calculated result dict
Return type:: dict