abstract

class aitoolbox.torchtrain.callbacks.abstract.AbstractCallback(callback_name, execution_order=0, device_idx_execution=None)[source]

Bases: object

Abstract callback class that all actual callback classes have to inherit from

In the inherited callback classes the callback methods should be overwritten and used to implement desired callback functionality at specific points of the train loop.

Parameters
  • callback_name (str) – name of the callback

  • execution_order (int) – order of the callback execution. If all the used callbacks have the orders set to 0, than the callbacks are executed in the order they were registered.

  • device_idx_execution (int or None) – index of the (CUDA GPU) device DDP process inside which the callback should be executed

register_train_loop_object(train_loop_obj)[source]
Introduce the reference to the encapsulating trainloop so that the callback has access to the

low level functionality of the trainloop

The registration is normally handled by the callback handler found inside the train loops. The handler is responsible for all the callback orchestration of the callbacks inside the trainloops.

Parameters

train_loop_obj (aitoolbox.torchtrain.train_loop.TrainLoop) – reference to the encapsulating trainloop

Returns

return the reference to the callback after it is registered

Return type

AbstractCallback

on_train_loop_registration()[source]

Execute callback initialization / preparation after the train_loop_object becomes available

Returns

None

on_epoch_begin()[source]

Logic executed at the beginning of the epoch

Returns

None

on_epoch_end()[source]

Logic executed at the end of the epoch

Returns

None

on_train_begin()[source]

Logic executed at the beginning of the overall training

Returns

None

on_train_end()[source]

Logic executed at the end of the overall training

Returns

None

on_batch_begin()[source]

Logic executed before the batch is inserted into the model

Returns

None

on_batch_end()[source]

Logic executed after the batch is inserted into the model

Returns

None

on_after_gradient_update(optimizer_idx)[source]

Logic executed after the model gradients are updated

To ensure the execution of this callback enable the self.train_loop_obj.grad_cb_used = True option in the on_train_loop_registration(). Otherwise logic implemented here will not be executed by the TrainLoop.

Parameters

optimizer_idx (int) – index of the current optimizer. Mostly useful when using multiple optimizers. When only a single optimizer is used this parameter can be ignored.

Returns

None

on_after_optimizer_step()[source]

Logic executed after the optimizer does a new step and updates the model weights

To ensure the execution of this callback enable the self.train_loop_obj.grad_cb_used = True option in the on_train_loop_registration(). Otherwise logic implemented here will not be executed by the TrainLoop.

Returns

None

on_multiprocess_start()[source]

Logic executed after a new multiprocessing process is spawned at the beginning of every child process

Returns

None

class aitoolbox.torchtrain.callbacks.abstract.AbstractExperimentCallback(callback_name, project_name=None, experiment_name=None, local_model_result_folder_path=None, cloud_save_mode=None, bucket_name=None, cloud_dir_prefix=None, execution_order=0, device_idx_execution=None)[source]

Bases: aitoolbox.torchtrain.callbacks.abstract.AbstractCallback

Extension of the AbstractCallback implementing the automatic experiment details inference from TrainLoop

This abstract callback is inherited from when the implemented callbacks intend to save results files into the experiment folder and also potentially upload them to AWS S3.

Parameters
  • callback_name (str) – name of the callback

  • project_name (str or None) – root name of the project

  • experiment_name (str or None) – name of the particular experiment

  • local_model_result_folder_path (str or None) – root local path where project folder will be created

  • cloud_save_mode (str or None) – Storage destination selector. For AWS S3: ‘s3’ / ‘aws_s3’ / ‘aws’ For Google Cloud Storage: ‘gcs’ / ‘google_storage’ / ‘google storage’ Everything else results just in local storage to disk

  • bucket_name (str) – name of the bucket in the cloud storage

  • cloud_dir_prefix (str) – path to the folder inside the bucket where the experiments are going to be saved

  • execution_order (int) – order of the callback execution. If all the used callbacks have the orders set to 0, than the callbacks are executed in the order they were registered.

  • device_idx_execution (int or None) – index of the (CUDA GPU) device DDP process inside which the callback should be executed

try_infer_experiment_details(infer_cloud_details)[source]

Infer paths where to save experiment related files from the running TrainLoop.

This details inference function should only be called after the callback has already been registered in the TrainLoop, e.g. in the on_train_loop_registration().

General rule:
take details from the TrainLoop -> for this option where experiment details are inferred from TrainLoop

all of the cloud_save_mode, bucket_name and cloud_dir_prefix should be set to None

Based on self.cloud_save_mode the inference decision is made as follows:
  • [‘s3’, ‘aws_s3’, ‘aws’] –> AWS S3

  • [‘gcs’, ‘google_storage’, ‘google storage’] –> Google Cloud Storage

  • ‘local’ or whatever value -> local only

Parameters

infer_cloud_details (bool) – should infer only local project folder details or also cloud project destination

Raises

AttributeError

Returns

None