NLP_metrics
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.ROUGEMetric(y_true, y_predicted, target_actual_text=False, output_text_dir=None, output_text_cleaning_regex=('<.*?>', '[^a-zA-Z0-9.?! ]+'))[source]
Bases:
AbstractBaseMetric
ROGUE score calculation
- From this package:
- Parameters:
- static dump_answer_text_to_disk(true_text, pred_text, output_text_dir, output_text_cleaning_regex, target_actual_text)[source]
- Problems:
Defined regex text cleaning to deal with Illegal division by zero https://ireneli.eu/2018/01/11/working-with-rouge-1-5-5-evaluation-metric-in-python/
- Parameters:
Returns:
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.ROUGEPerlMetric(y_true, y_predicted, output_text_dir, output_text_cleaning_regex=('<.*?>', '[^a-zA-Z0-9.?! ]+'), target_actual_text=False)[source]
Bases:
AbstractBaseMetric
ROGUE score calculation using the Perl implementation
- Use this package:
https://pypi.org/project/pyrouge/ https://github.com/bheinzerling/pyrouge
- Problems:
Defined regex text cleaning to deal with Illegal division by zero https://ireneli.eu/2018/01/11/working-with-rouge-1-5-5-evaluation-metric-in-python/
- Parameters:
- static dump_answer_text_to_disk(true_text, pred_text, output_text_dir, output_text_cleaning_regex, target_actual_text)[source]
- Problems:
Defined regex text cleaning to deal with Illegal division by zero https://ireneli.eu/2018/01/11/working-with-rouge-1-5-5-evaluation-metric-in-python/
- Parameters:
Returns:
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.ExactMatchTextMetric(y_true, y_predicted, target_actual_text=False, output_text_dir=None)[source]
Bases:
AbstractBaseMetric
Calculate exact match of answered strings
- Parameters:
- static normalize_answer(text_str)[source]
Convert to lowercase and remove punctuation, articles and extra whitespace.
All methods below this line are from the official SQuAD 2.0 eval script https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/
- Parameters:
text_str (str) –
- Returns:
str
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.F1TextMetric(y_true, y_predicted, target_actual_text=False, output_text_dir=None)[source]
Bases:
AbstractBaseMetric
Calculate F1 score of answered strings
- Parameters:
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.BLEUSentenceScoreMetric(y_true, y_predicted, source_sents=None, output_text_dir=None)[source]
Bases:
AbstractBaseMetric
BLEU score calculation
NLTK provides the sentence_bleu() function for evaluating a candidate sentence against one or more reference sentences.
https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
The reference sentences must be provided as a list of sentences where each reference is a list of tokens. The candidate sentence is provided as a list of tokens. For example:
reference = [[‘this’, ‘is’, ‘a’, ‘test’], [‘this’, ‘is’ ‘test’]] candidate = [‘this’, ‘is’, ‘a’, ‘test’] score = sentence_bleu(reference, candidate)
- Parameters:
- static dump_translation_text_to_disk(source_sents, pred_translations, true_translations, sentence_bleu_results, output_text_dir)[source]
- Parameters:
Returns:
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.BLEUCorpusScoreMetric(y_true, y_predicted, source_sents=None, output_text_dir=None)[source]
Bases:
AbstractBaseMetric
BLEU corpus score calculation
Function called corpus_bleu() for calculating the BLEU score for multiple sentences such as a paragraph or a document.
https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
The references must be specified as a list of documents where each document is a list of references and each alternative reference is a list of tokens, e.g. a list of lists of lists of tokens. The candidate documents must be specified as a list where each document is a list of tokens, e.g. a list of lists of tokens.
references = [[[‘this’, ‘is’, ‘a’, ‘test’], [‘this’, ‘is’ ‘test’]]] candidates = [[‘this’, ‘is’, ‘a’, ‘test’]] score = corpus_bleu(references, candidates)
- Parameters:
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.BLEUScoreStrTorchNLPMetric(y_true, y_predicted, lowercase=False, source_sents=None, output_text_dir=None)[source]
Bases:
AbstractBaseMetric
BLEU score calculation using the TorchNLP implementation
Example
- hypotheses = [
“The brown fox jumps over the dog 笑”, “The brown fox jumps over the dog 2 笑” ]
- references = [
“The quick brown fox jumps over the lazy dog 笑”, “The quick brown fox jumps over the lazy dog 笑” ]
get_moses_multi_bleu(hypotheses, references, lowercase=True) 46.51
- Parameters:
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.PerplexityMetric(batch_losses)[source]
Bases:
AbstractBaseMetric
Perplexity metric used in MT
- Parameters:
batch_losses (numpy.array or list) –
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.GLUEMetric(y_true, y_predicted, task_name)[source]
Bases:
AbstractBaseMetric
GLUE evaluation metrics
Wrapper around HF Transformers
glue_compute_metrics()
- Parameters:
y_true –
y_predicted –
task_name (str) – name of the GLUE task
- class aitoolbox.nlp.experiment_evaluation.NLP_metrics.XNLIMetric(y_true, y_predicted)[source]
Bases:
AbstractBaseMetric
XNLI evaluation metrics
Wrapper around HF Transformers
xnli_compute_metrics()
- Parameters:
y_true –
y_predicted –