SQuAD2DataReader

aitoolbox.nlp.dataset.SQuAD2.SQuAD2DataReader.get_dataset_local_copy(local_dataset_folder_path, protect_local_folder=True)[source]

Interface method for getting a local copy of SQuAD2 dataset

If a local copy is not found, dataset is automatically downloaded from S3.

Parameters
  • local_dataset_folder_path (str) –

  • protect_local_folder (bool) –

Returns

None

class aitoolbox.nlp.dataset.SQuAD2.SQuAD2DataReader.SQuAD2ConcatContextDatasetReader(file_path, tokenizer=None, is_train=True, dev_mode_size=None)[source]

Bases: object

Parameters
  • file_path (str) –

  • tokenizer

  • is_train (bool) –

  • dev_mode_size

read()[source]

Read SQuAD data. Tested and it works

Returns

Return type

list, aitoolbox.nlp.core.vocabulary.Vocabulary

process_example(paragraph_tokens, question_tokens, char_spans, answer_texts)[source]
Parameters
  • paragraph_tokens

  • question_tokens

  • char_spans

  • answer_texts

Returns:

tokenize_process_paragraph(paragraph_text)[source]
Parameters

paragraph_text

Returns:

tokenize_process_question(question_text)[source]
Parameters

question_text

Returns:

class aitoolbox.nlp.dataset.SQuAD2.SQuAD2DataReader.GeneratorSQuAD2ConcatContextDatasetReader(file_path, tokenizer=None, is_train=True, dev_mode_size=None)[source]

Bases: aitoolbox.nlp.dataset.SQuAD2.SQuAD2DataReader.SQuAD2ConcatContextDatasetReader

This implementation with the generator has not been tested yet.

Check especially in the read() if calling list(self.read_generator()) also in turn fills the self.vocab, thus you get returned a complete vocabulary

Parameters
  • file_path

  • tokenizer

  • is_train

  • dev_mode_size

read()[source]
Returns

Return type

list

read_generator()[source]
Yields

list