model_save

class aitoolbox.torchtrain.callbacks.model_save.ModelCheckpoint(project_name, experiment_name, local_model_result_folder_path, hyperparams, cloud_save_mode='s3', bucket_name='model-result', cloud_dir_prefix='', rm_subopt_local_models=False, num_best_checkpoints_kept=2)[source]

Bases: AbstractCallback

Check-point save the model during training to disk or also to S3 / GCS cloud storage

Parameters:
  • project_name (str) – root name of the project

  • experiment_name (str) – name of the particular experiment

  • local_model_result_folder_path (str) – root local path where project folder will be created

  • hyperparams (dict) – used hyper-parameters. When running the TrainLoop from jupyter notebook in order to ensure the python experiment file copying to the experiment folder, the user needs to manually specify the python file path as the value for the experiment_file_path key. If running the training directly from the terminal the path deduction is done automatically.

  • cloud_save_mode (str or None) – Storage destination selector. For AWS S3: ‘s3’ / ‘aws_s3’ / ‘aws’ For Google Cloud Storage: ‘gcs’ / ‘google_storage’ / ‘google storage’ Everything else results just in local storage to disk

  • bucket_name (str) – name of the bucket in the cloud storage

  • cloud_dir_prefix (str) – path to the folder inside the bucket where the experiments are going to be saved

  • rm_subopt_local_models (bool or str) – if True, the deciding metric is set to ‘loss’. Give string metric name to set it as a deciding metric for suboptimal model removal. If metric name consists of substring ‘loss’ the metric minimization is done otherwise metric maximization is done

  • num_best_checkpoints_kept (int) – number of best performing models which are kept when removing suboptimal model checkpoints

on_epoch_end()[source]

Logic executed at the end of the epoch

Returns:

None

on_train_loop_registration()[source]

Execute callback initialization / preparation after the train_loop_object becomes available

Returns:

None

save_hyperparams()[source]
class aitoolbox.torchtrain.callbacks.model_save.ModelIterationCheckpoint(save_frequency, project_name, experiment_name, local_model_result_folder_path, hyperparams, cloud_save_mode='s3', bucket_name='model-result', cloud_dir_prefix='', rm_subopt_local_models=False, num_best_checkpoints_kept=2)[source]

Bases: ModelCheckpoint

Check-point save the model during training to disk or also to S3 / GCS cloud storage

Parameters:
  • save_frequency (int) – frequency of saving the model checkpoint every specified number of training iterations

  • project_name (str) – root name of the project

  • experiment_name (str) – name of the particular experiment

  • local_model_result_folder_path (str) – root local path where project folder will be created

  • hyperparams (dict) – used hyper-parameters. When running the TrainLoop from jupyter notebook in order to ensure the python experiment file copying to the experiment folder, the user needs to manually specify the python file path as the value for the experiment_file_path key. If running the training directly from the terminal the path deduction is done automatically.

  • cloud_save_mode (str or None) – Storage destination selector. For AWS S3: ‘s3’ / ‘aws_s3’ / ‘aws’ For Google Cloud Storage: ‘gcs’ / ‘google_storage’ / ‘google storage’ Everything else results just in local storage to disk

  • bucket_name (str) – name of the bucket in the cloud storage

  • cloud_dir_prefix (str) – path to the folder inside the bucket where the experiments are going to be saved

  • rm_subopt_local_models (bool or str) – if True, the deciding metric is set to ‘loss’. Give string metric name to set it as a deciding metric for suboptimal model removal. If metric name consists of substring ‘loss’ the metric minimization is done otherwise metric maximization is done

  • num_best_checkpoints_kept (int) – number of best performing models which are kept when removing suboptimal model checkpoints

on_batch_end()[source]

Logic executed after the batch is inserted into the model

Returns:

None

class aitoolbox.torchtrain.callbacks.model_save.ModelTrainEndSave(project_name, experiment_name, local_model_result_folder_path, hyperparams, val_result_package=None, test_result_package=None, cloud_save_mode='s3', bucket_name='model-result', cloud_dir_prefix='')[source]

Bases: AbstractCallback

At the end of training execute model performance evaluation, build result package report and save it

together with the final model to local disk and possibly to S3 / GCS cloud storage

Parameters:
  • project_name (str) – root name of the project

  • experiment_name (str) – name of the particular experiment

  • local_model_result_folder_path (str) – root local path where project folder will be created

  • hyperparams (dict) – used hyper-parameters. When running the TrainLoop from jupyter notebook in order to ensure the python experiment file copying to the experiment folder, the user needs to manually specify the python file path as the value for the experiment_file_path key. If running the training directly from the terminal the path deduction is done automatically.

  • val_result_package (AbstractResultPackage) – result package to be evaluated on the validation dataset

  • test_result_package (AbstractResultPackage) – result package to be evaluated on the test dataset

  • cloud_save_mode (str or None) – Storage destination selector. For AWS S3: ‘s3’ / ‘aws_s3’ / ‘aws’ For Google Cloud Storage: ‘gcs’ / ‘google_storage’ / ‘google storage’ Everything else results just in local storage to disk

  • bucket_name (str) – name of the bucket in the cloud storage

  • cloud_dir_prefix (str) – path to the folder inside the bucket where the experiments are going to be saved

on_train_end()[source]

Logic executed at the end of the overall training

Returns:

None

on_train_loop_registration()[source]

Execute callback initialization / preparation after the train_loop_object becomes available

Returns:

None

save_hyperparams()[source]
check_result_packages()[source]