Callbacks

Callbacks are a way to interact with the optimization process. For instance, the ModelCheckpoint callback allows to save the weights of the epoch that has the best “score”, or the EarlyStopping callback allows to stop the training when the “score” has not gone up for a while, etc. The following presents the callbacks available in Poutyne, but first the documentation of the Callback class shows which methods are available in the callback and what arguments they are provided with.

Callback class

class poutyne.Callback[source]
params

Contains 'epoch' and 'steps_per_epoch' keys which are passed to the when training. Contains 'steps' when evaluating. May contain other keys.

Type:

dict

model

A reference to the Model object which is using the callback.

Type:

Model

on_epoch_begin(epoch_number: int, logs: Dict)[source]

Is called before the beginning of each epoch.

Parameters:
  • epoch_number (int) – The epoch number.

  • logs (dict) – Usually an empty dict.

on_epoch_end(epoch_number: int, logs: Dict)[source]

Is called before the end of each epoch.

Parameters:
  • epoch_number (int) – The epoch number.

  • logs (dict) –

    Contains the following keys:

    • 'epoch': The epoch number.

    • 'time': The computation time of the epoch.

    • 'loss': The average loss of the batches.

    • Values of training metrics: One key for each type of metrics. The metrics are also averaged.

    • 'val_loss': The average loss of the batches on the validation set.

    • Values of validation metrics: One key for each type of metrics on the validation set. Each key is prefixed by 'val_'. The metrics are also averaged.

Example:

logs = {'epoch': 2, 'time': 6.08248, 'loss': 0.40161, 'acc': 89.052, 'fscore_micro': 0.89051,
        'val_loss': 0.36814, 'val_acc': 89.52, 'val_fscore_micro': 0.89520}
on_train_batch_begin(batch_number: int, logs: Dict)[source]

Is called before the beginning of the training batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) – Usually an empty dict.

on_train_batch_end(batch_number: int, logs: Dict)[source]

Is called before the end of the training batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) –

    Contains the following keys:

    • 'batch': The batch number.

    • 'size': The size of the batch as inferred by get_batch_size().

    • 'time': The computation time of the batch.

    • 'loss': The loss of the batch.

    • Values of the batch metrics for the specific batch: One key for each type of metrics.

Example:

logs = {'batch': 171, 'size': 32, 'time': 0.00310, 'loss': 1.95204, 'acc': 43.75}
on_valid_batch_begin(batch_number: int, logs: Dict)[source]

Is called before the beginning of the validation batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) – Usually an empty dict.

on_valid_batch_end(batch_number: int, logs: Dict)[source]

Is called before the end of the validation batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) –

    Contains the following keys:

    • 'batch': The batch number.

    • 'size': The size of the batch as inferred by get_batch_size().

    • 'time': The computation time of the batch.

    • val_loss': The loss of the batch.

    • Values of the batch metrics for the specific batch: One key for each type of metrics. Each key is prefixed by 'val_'. The metrics are also averaged.

Example:

logs = {'batch': 171, 'size': 32, 'time': 0.00310, 'val_loss': 1.95204, 'val_acc': 43.75}
on_test_batch_begin(batch_number: int, logs: Dict)[source]

Is called before the beginning of the testing batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) – Usually an empty dict.

on_test_batch_end(batch_number: int, logs: Dict)[source]

Is called before the end of the testing batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) –

    Contains the following keys:

    • 'batch': The batch number.

    • 'size': The size of the batch as inferred by get_batch_size().

    • 'time': The computation time of the batch.

    • 'loss': The loss of the batch.

    • Values of the batch metrics for the specific batch: One key for each type of metrics. Each key is prefixed by 'test_'. The metrics are also averaged.

Example:

logs = {'batch': 171, 'size': 32, 'time': 0.00310, 'test_loss': 1.95204, 'test_acc': 43.75}
on_predict_batch_begin(batch_number: int, logs: Dict)[source]

Is called before the beginning of the predict batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) – Usually an empty dict.

on_predict_batch_end(batch_number: int, logs: Dict)[source]

Is called before the end of the predict batch.

Parameters:
  • batch_number (int) – The batch number.

  • logs (dict) –

    Contains the following keys:

    • 'batch': The batch number.

    • 'time': The computation time of the batch.

Example:

logs = {'batch': 171, 'time': 0.00310}
on_train_begin(logs: Dict)[source]

Is called before the beginning of the training.

Parameters:

logs (dict) – Usually an empty dict.

on_train_end(logs: Dict)[source]

Is called before the end of the training.

Parameters:

logs (dict) – Usually an empty dict.

on_valid_begin(logs: Dict)[source]

Is called before the beginning of the validation.

Parameters:

logs (dict) – Usually an empty dict.

on_valid_end(logs: Dict)[source]

Is called before the end of the validation.

Parameters:

logs (dict) –

Contains the following keys:

  • 'time': The total computation time of the test.

  • 'val_loss': The average loss of the batches on the test set.

  • Values of testing metrics: One key for each type of metrics. Each key is prefixed by 'val_'. The metrics are also averaged.

Example:

logs = {'time': 6.08248, 'val_loss': 0.40161, 'val_acc': 89.052, 'val_fscore_micro': 0.89051}
on_test_begin(logs: Dict)[source]

Is called before the beginning of the testing.

Parameters:

logs (dict) – Usually an empty dict.

on_test_end(logs: Dict)[source]

Is called before the end of the testing.

Parameters:

logs (dict) –

Contains the following keys:

  • 'time': The total computation time of the test.

  • 'test_loss': The average loss of the batches on the test set.

  • Values of testing metrics: One key for each type of metrics. Each key is prefixed by 'test_'. The metrics are also averaged.

Example:

logs = {'time': 6.08248, 'test_loss': 0.40161, 'test_acc': 89.052, 'test_fscore_micro': 0.89051}
on_predict_begin(logs: Dict)[source]

Is called before the beginning of the predict.

Parameters:

logs (dict) – Usually an empty dict.

on_predict_end(logs: Dict)[source]

Is called before the end of the predict.

Parameters:

logs (dict) –

Contains the following keys:

  • 'time': The total computation time of the predict.

Example:

logs = {'time': 6.08248}
on_backward_end(batch_number: int)[source]

Is called after the backpropagation but before the optimization step.

Parameters:

batch_number (int) – The batch number.

class poutyne.LambdaCallback(*, on_epoch_begin=None, on_epoch_end=None, on_train_batch_begin=None, on_train_batch_end=None, on_valid_batch_begin=None, on_valid_batch_end=None, on_test_batch_begin=None, on_test_batch_end=None, on_predict_batch_begin=None, on_predict_batch_end=None, on_train_begin=None, on_train_end=None, on_valid_begin=None, on_valid_end=None, on_test_begin=None, on_test_end=None, on_predict_begin=None, on_predict_end=None, on_backward_end=None)[source]

Provides an interface to easily define a callback from lambdas or functions.

Parameters:

kwargs – The arguments of this class are keyword arguments with the same names as the methods in the Callback class. The values are lambdas or functions taking the same arguments as the corresponding methods in Callback.

See:

Callback

Example

from poutyne import LambdaCallback
callbacks = [LambdaCallback(
    on_epoch_end=lambda epoch_number, logs: print(f"Epoch {epoch_number} end"),
    on_train_end=lambda logs: print("Training ended")
)]
model.fit(...., callbacks=callbacks)

Poutyne’s Callbacks

class poutyne.TerminateOnNaN[source]

Stops the training when the loss is either NaN or inf.

class poutyne.BestModelRestore(*, monitor: str = 'val_loss', mode: str = 'min', verbose: bool = False)[source]

Restore the weights of the best model at the end of the training depending on a monitored quantity.

Parameters:
  • monitor (str) – Quantity to monitor. (Default value = ‘val_loss’)

  • mode (str) – One of {‘min’, ‘max’}. Whether the monitored has to be maximized or minimized. For instance, for val_accuracy, this should be max, and for val_loss, this should be min, etc. (Default value = ‘min’)

  • verbose (bool) – Whether to display a message when the model has improved or when restoring the best model. (Default value = False)

class poutyne.EarlyStopping(*, monitor: str = 'val_loss', min_delta: float = 0.0, patience: int = 0, verbose: bool = False, mode: str = 'min')[source]

The source code of this class is under the MIT License and was copied from the Keras project, and has been modified.

Stop training when a monitored quantity has stopped improving.

Parameters:
  • monitor (str) – Quantity to be monitored.

  • min_delta (float) – Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement. (Default value = 0)

  • patience (int) – Number of epochs with no improvement after which training will be stopped. (Default value = 0)

  • verbose (bool) – Whether to print when early stopping is done. (Default value = False)

  • mode (str) – One of {‘min’, ‘max’}. In min mode, training will stop when the quantity monitored has stopped decreasing; in max mode it will stop when the quantity monitored has stopped increasing. (Default value = ‘min’)

class poutyne.DelayCallback(callbacks: Callback | Sequence, *, epoch_delay: int | None = None, batch_delay: int | None = None)[source]

Delays one or many callbacks for a certain number of epochs or number of batches. If both epoch_delay and batch_delay are provided, the biggest has precedence.

Parameters:
  • callbacks (Callback, Sequence[Callback]) – A callback or a sequence of callbacks to delay.

  • epoch_delay (int, optional) – Number of epochs to delay.

  • batch_delay (int, optional) – Number of batches to delay. The number of batches can span many epochs. When the batch delay expires (i.e. there are more than batch_delay done), the on_epoch_begin() method is called on the callback(s) before the on_train_batch_begin() method.

class poutyne.ClipNorm(parameters: Tensor | Iterable[Tensor], max_norm: float, norm_type: float = 2.0)[source]

Uses PyTorch’s clip_grad_norm_() method to clip gradient.

See:

torch.nn.utils.clip_grad_norm_()

class poutyne.ClipValue(parameters: Tensor | Iterable[Tensor], clip_value: float)[source]

Uses PyTorch’s clip_grad_value_() method to clip gradient.

See:

torch.nn.utils.clip_grad_value_()

Logging

Training Results

class poutyne.CSVLogger(filename: str, *, batch_granularity: bool = False, separator: str = ',', append: bool = False)[source]

Callback that outputs the result of each epoch_number or batch into a CSV file.

Parameters:
  • filename (str) – The filename of the CSV.

  • batch_granularity (bool) – Whether to also output the result of each batch in addition to the epochs. (Default value = False)

  • separator (str) – The separator to use in the CSV. (Default value = ‘,’)

  • append (bool) – Whether to append to an existing file.

class poutyne.AtomicCSVLogger(filename, *, batch_granularity: bool = False, separator: str = ',', append: bool = False, temporary_filename: str | None = None)[source]

Callback that outputs the result of each epoch_number or batch into a CSV file in an atomic matter.

Parameters:
  • filename (str) – The filename of the CSV.

  • temporary_filename (str, optional) – Temporary filename for the CSV file so that it can be written atomically.

  • batch_granularity (bool) – Whether to also output the result of each batch in addition to the epochs. (Default value = False)

  • separator (str) – The separator to use in the CSV. (Default value = ‘,’)

  • append (bool) – Whether to append to an existing file.

class poutyne.TensorBoardLogger(writer, split_train_val: bool = False)[source]

Callback that outputs the result of each epoch_number or batch into a Tensorboard experiment folder.

Parameters:
  • writer (SummaryWriter) – The tensorboard writer.

  • split_train_val (bool) – Whether to put each training and validation metric in the same graphs. (Default = False)

Example

Using TensorBoardLogger:

from torch.utils.tensorboard import SummaryWriter
from poutyne import Model, TensorBoardLogger

writer = SummaryWriter('runs')
tb_logger = TensorBoardLogger(writer)

model = Model(...)
model.fit_generator(..., callbacks=[tb_logger])
class poutyne.MLFlowLogger(deprecated_experiment_name: str | None = None, *, experiment_name: str | None = None, experiment_id: str | None = None, run_id: str | None = None, tracking_uri: str | None = None, batch_granularity: bool = False, terminate_on_end=True)[source]

MLflow logger to manage logging of experiments parameters, metrics update, models log and other information. The logger will log all run into the same experiment.

Parameters:
  • experiment_name (Optional[str]) – The name of the experiment. The name is case-sensitive. An experiment_id must not be passed if this is passed.

  • experiment_id (Optional[str]) – The id of the experiment. An experiment_name must not be passed if this is passed.

  • run_id (Optional[str]) – The id of the run. An experiment name/id must not be passed if this is passed.

  • tracking_uri (Optional[str]) – Either the URI tracking path (for server tracking) of the absolute path to the directory to save the files (for file store). For example: http://<ip address>:<port> (remote server) or /home/<user>/mlflow-server (local server). If None, will use the default MLflow file tracking URI "./mlruns".

  • batch_granularity (bool) – Whether to also output the result of each batch in addition to the epochs. (Default value = False)

  • terminate_on_end (bool) – Wheter to end the run at the end of the training or testing. (Default value = True)

Example

Using file store:

mlflow_logger = MLFlowLogger(experiment_name="experiment", tracking_uri="/absolute/path/to/directory")
mlflow_logger.log_config_params(config_params=cfg_dict) # logging the config dictionary

# our Poutyne model bundle
model_bundle = ModelBundle.from_network(directory=saving_directory, network=network, optimizer=optimizer,
                                        loss_function=cross_entropy_loss, batch_metrics=[accuracy],
                                        device=device)

# Using the MLflow logger callback during training
model_bundle.train(train_generator=train_loader, valid_generator=valid_loader, epochs=1,
                   seed=42, callbacks=[mlflow_logger])

Using server tracking:

mlflow_logger = MLFlowLogger(experiment_name="experiment", tracking_uri="http://IP_ADDRESS:PORT")
mlflow_logger.log_config_params(config_params=cfg_dict) # logging the config dictionary

# our Poutyne model bundle
model_bundle = ModelBundle.from_network(directory=saving_directory, network=network, optimizer=optimizer,
                                        loss_function=cross_entropy_loss, batch_metrics=[accuracy],
                                        device=device)

# Using the MLflow logger callback during training
model_bundle.train(train_generator=train_loader, valid_generator=valid_loader, epochs=1,
                   seed=42, callbacks=[mlflow_logger])
log_config_params(config_params: Mapping, **kwargs: Any) None[source]
Parameters:

config_params (Mapping) – The config parameters of the training to log, such as number of epoch, loss function, optimizer etc.

log_param(param_name: str, value: str | float, **kwargs: Any) None[source]

Log the value of a parameter into the experiment.

Parameters:
  • param_name (str) – The name of the parameter.

  • value (Union[str, float]) – The value of the parameter.

log_metric(metric_name: str, value: float, **kwargs: Any) None[source]

Log the value of a metric into the experiment.

Parameters:
  • metric_name (str) – The name of the metric.

  • value (float) – The value of the metric.

  • step (Union[int, None]) – The step when the metric was computed (Default = None).

class poutyne.WandBLogger(*, name: str | None = None, group: str | None = None, config: Dict | None = None, save_dir: str | None = None, offline: bool | None = False, run_id: str | None = None, anonymous: bool | None = None, version: str | None = None, project: str | None = None, experiment=None, batch_granularity: bool | None = False, checkpoints_path: str | None = None, initial_artifacts_paths: List[str] | None = None, log_gradient_frequency: int | None = None, training_batch_shape: tuple | None = None)[source]

WandB logger to manage logging of experiments parameters, metrics update, models log, gradient values and other information. The logger will log all run into the same experiment.

Parameters:
  • name (str) – Display name for the run.

  • group (Optional[str]) – Specify a group to organize individual runs into a larger experiment

  • config (Optional[Dict]) – A dictionary summarizing the configuration related to the current run.

  • save_dir (str) – Path where data is saved (wandb dir by default).

  • offline (bool) – Run logger offline to later stream data to a remote server.

  • id (str) – Sets the version, mainly used to resume a previous run.

  • version (str) – Same as id.

  • anonymous (bool) – Enables or explicitly disables anonymous logging.

  • project (str) – The project’s name to which this run will belong.

  • experiment – WandB run to use instead of creating a new one. The other WandB’s configuration parameters will be ignored.

  • batch_granularity (bool) – Whether to also output the result of each batch in addition to the epochs. (Default value = False).

  • checkpoints_path (Optional[str]) – A path leading to the checkpoint saving directory. You need to specify this argument to log the model checkpoints at the end of the training phase.

  • initial_artifacts_paths (Optional[List[str]]) – a list of paths leading to artifacts to be logged before the start of the training.

  • log_gradient_frequency (int) – log gradients and parameters every N batches (Default value = None).

  • training_batch_shape (tuples) – Shape of a training batch. Used for logging architecture on wandb.

Example

wandb_logger = pt.WandBLogger(name="A run", project="A project")
config_dict = {"Optimizer": "sgd", "Loss": "Cross-Entropy", "lr": 0.01}
wandb_logger.log_config_params(config_params=config_dict)  # logging the config dictionary
# our Poutyne experiment
experiment = pt.Experiment(
    directory="a/path",
    network=network,
    device="cpu",
    optimizer="sgd",
    loss_function="cross_entropy",
    batch_metrics=["accuracy"],
)
# Using the WandB logger callback during training
experiment.train(
    train_generator=train_loader, valid_generator=valid_loader, epochs=2, seed=42, callbacks=[wandb_logger]
)
# You can access the wandb run via the attribute .run if you want to use other wandb features
image = wandb.Image('a/image.png', caption="a caption")
wandb_logger.run.log({"My image": image})

wandb.finish()  # Call once your finished with your experiment.
log_config_params(config_params: Dict) None[source]
Parameters:

Dict (config_params) – Dictionary of config parameters of the training to log, such as number of epoch, loss function, optimizer etc.

Training Progress

class poutyne.ProgressionCallback(*, coloring=True, progress_bar=True, equal_weights=False, show_on_valid=True, show_every_n_train_steps='all', show_every_n_valid_steps='all', show_every_n_test_steps='all')[source]

Default progression callback used in Model. You can use the progress_options in Model instead of instantiating this callback. If you choose to use this callback anyway, make sure to pass verbose=False to fit() or fit_generator().

Parameters:
  • coloring (Union[bool, Dict], optional) – If bool, whether to display the progress of the training with default colors highlighting. If Dict, the field and the color to use as colorama . The fields are text_color, ratio_color, metric_value_color, time_color and progress_bar_color. In both case, will be ignore if verbose is set to False. (Default value = True)

  • progress_bar (bool) – Whether or not to display a progress bar showing the epoch progress. Note that if the size of the output text with the progress bar is larger than the shell output size, the formatting could be impacted (a line for every step). (Default value = True)

  • equal_weights (bool) – Whether or not the duration of each step is weighted equally when computing the average time of the steps and, thus, the ETA. By default, newer step times receive more weights than older step times. Set this to true to have equal weighting instead.

  • show_on_valid (bool) – Whether or not to display the progression during the validation phase. (Default value = True)

  • show_every_n_train_steps (Union[str, int]) – Show a subset of the training steps. If 'all', show all steps. If 'none', do not show the steps (i.e. only show the stats at the end of the epoch). If an integer n, only show every n-th steps. (Default value = ‘all’).

  • show_every_n_valid_steps (Union[str, int]) – Show a subset of the validation steps. If 'all', show all steps. If 'none', do not show the steps (i.e. only show the stats at the end of the epoch). If an integer n, only show every n-th steps. (Default value = ‘all’).

  • show_every_n_test_steps (Union[str, int]) – Show a subset of the testing steps. If 'all', show all steps. If 'none', do not show the steps (i.e. only show the stats at the end of the testing). If an integer n, show only every n-th steps. (Default value = ‘all’).

Tracking

class poutyne.TensorBoardGradientTracker(writer, keep_bias: bool = False)[source]

Wrapper to track the statistics of the weights’ gradient per layer and log them in TensorBoard per epoch.

Parameters:
  • writer (SummaryWriter) – The TensorBoard writer.

  • keep_bias (bool) – Either or not to log the bias of the network.

Example

Using TensorBoardGradientTracker:

from torch.utils.tensorboard import SummaryWriter
from poutyne import Model, TensorBoardGradientTracker

writer = SummaryWriter('runs')
tb_tracker = TensorBoardGradientTracker(writer)

model = Model(...)
model.fit_generator(..., callbacks=[tb_tracker])

Notification

class poutyne.Notificator[source]

The interface of the Notificator. It must at least implement a send_notification method. The interface is similar to the notif package.

abstract send_notification(message: str, *, subject: str | None = None) None[source]

Abstract method to send a notification.

Parameters:
  • message (str) – The message to send as a notification message through the notificator.

  • subject (str) – The subject of the notification. If None, the default message is used. By default, None. Also, we recommend formatting the subject for better readability, e.g. using bolding it using Markdown and appending with a new line.

class poutyne.NotificationCallback(notificator: Notificator, alert_frequency: int = 1, experiment_name: None | str = None)[source]

Send a notification to a channel at the beginning/ending of the training/testing and at a constant frequency (alert_frequency) during the training.

Parameters:
  • notificator (Notificator) –

    The notification channel to send the message. The expected interface need to implement a send_notification method to send the message. You can see the notif package which implements some Notificator respecting the interface.

  • alert_frequency (int) – The frequency (in epoch), during training, to send an update. By default, 1.

  • experiment_name (Union[str, None]) – The name of the experiment to add to the message. By default, None.

Example

from notif.notificator import SlackNotificator
from poutyne.framework.callbacks.notification import NotificationCallback

webhook_url = "a_link"
slack_notif = SlackNotificator(webhook_url=webhook_url)

notif_callback = NotificationCallback(notificator=slack_notif)

model = Model(...)
model.fit_generator(..., callbacks=[notif_callback])

Checkpointing

Poutyne provides callbacks for checkpointing the state of the optimization so that it can be stopped and restarted at a later point. All the checkpointing classes inherit the PeriodicSaveCallback class and, thus, have the same arguments in their constructors. They may have other arguments specific to their purpose.

class poutyne.PeriodicSaveCallback(filename: str, *, monitor: str = 'val_loss', mode: str = 'min', save_best_only: bool = False, keep_only_last_best: bool = False, restore_best: bool = False, period: int = 1, verbose: bool = False, temporary_filename: str | None = None, atomic_write: bool = True, open_mode: str = 'wb', read_mode: str = 'rb')[source]

Write a file (or checkpoint) after every epoch. filename can contain named formatting options, which will be filled the value of epoch and keys in logs (passed in on_epoch_end). For example: if filename is weights.{epoch:02d}-{val_loss:.2f}.txt, then save_file() will be called with a file descriptor for a file with the epoch number and the validation loss in the filename.

By default, the file is written atomically to the specified filename so that the training can be killed and restarted later using the same filename for periodic file saving. To do so, a temporary file is created with the name of filename + '.tmp' and is then moved to the final destination after the checkpoint is done. The temporary_filename argument allows to change the path of this temporary file.

Parameters:
  • filename (str) – Path to save the model file.

  • monitor (str) – Quantity to monitor. (Default value = ‘val_loss’)

  • verbose (bool) – Whether to display a message when saving and restoring a checkpoint. (Default value = False)

  • save_best_only (bool) – If save_best_only is true, the latest best model according to the quantity monitored will not be overwritten. (Default value = False)

  • keep_only_last_best (bool) – Whether only the last saved best checkpoint is kept. Applies only when save_best_only is true. (Default value = False)

  • restore_best (bool) – If restore_best is true, the model will be reset to the last best checkpoint done. This option only works when save_best_only is also true. (Default value = False)

  • mode (str) – One of {‘min’, ‘max’}. If save_best_only is true, the decision to overwrite the current save file is made based on either the maximization or the minimization of the monitored quantity. For val_accuracy, this should be max, for val_loss this should be min, etc. (Default value = ‘min’)

  • period (int) – Interval (number of epochs) between checkpoints. (Default value = 1)

  • temporary_filename (str, optional) – Temporary filename for the checkpoint so that the last checkpoint can be written atomically. See the atomic_write argument.

  • atomic_write (bool) – Whether to write atomically the checkpoint. See the description above for details. (Default value = True)

  • open_mode (str) – mode option passed to open(). (Default value = ‘wb’)

abstract save_file(fd: IO, epoch_number: int, logs: Dict) None[source]

Abstract method that is called every time a save needs to be done.

Parameters:
  • fd (IO) – The descriptor of the file in which to write.

  • epoch_number (int) – The epoch number.

  • logs (Dict) – Dictionary passed on epoch end.

abstract restore(fd: IO) None[source]

Abstract method that is called when a save needs to be restored. This happens at the end of the training when restore_best is true.

Parameters:

fd (IO) – The descriptor of the file to read.

class poutyne.ModelCheckpoint(filename: str, *, monitor: str = 'val_loss', mode: str = 'min', save_best_only: bool = False, keep_only_last_best: bool = False, restore_best: bool = False, period: int = 1, verbose: bool = False, temporary_filename: str | None = None, atomic_write: bool = True, open_mode: str = 'wb', read_mode: str = 'rb')[source]

Save the model after every epoch. See PeriodicSaveCallback for the arguments’ descriptions.

See:

PeriodicSaveCallback

class poutyne.OptimizerCheckpoint(filename: str, *, monitor: str = 'val_loss', mode: str = 'min', save_best_only: bool = False, keep_only_last_best: bool = False, restore_best: bool = False, period: int = 1, verbose: bool = False, temporary_filename: str | None = None, atomic_write: bool = True, open_mode: str = 'wb', read_mode: str = 'rb')[source]

Save the state of the optimizer after every epoch. The optimizer can be reloaded as follows.

model = Model(model, optimizer, loss_function)
model.load_optimizer_state(filename)

See PeriodicSaveCallback for the arguments’ descriptions.

See:

PeriodicSaveCallback

class poutyne.RandomStatesCheckpoint(filename: str, *, monitor: str = 'val_loss', mode: str = 'min', save_best_only: bool = False, keep_only_last_best: bool = False, restore_best: bool = False, period: int = 1, verbose: bool = False, temporary_filename: str | None = None, atomic_write: bool = True, open_mode: str = 'wb', read_mode: str = 'rb')[source]

Save Python, Numpy and Pytorch’s (both CPU and GPU) random states after every epoch. The random states can be reloaded using load_random_states().

See PeriodicSaveCallback for the arguments’ descriptions.

See:

PeriodicSaveCallback

class poutyne.LRSchedulerCheckpoint(lr_scheduler: _PyTorchLRSchedulerWrapper, *args, **kwargs)[source]

Save the state of an LR scheduler callback after every epoch. The LR scheduler callback should not be passed to the fit*() methods since it is called by this callback instead. The LR scheduler can be reloaded as follows.

lr_scheduler = AnLRSchedulerCallback(...)
lr_scheduler.load_state(filename)

See PeriodicSaveCallback for the arguments’ descriptions.

Parameters:

lr_scheduler – An LR scheduler callback.

See:

PeriodicSaveCallback

class poutyne.PeriodicSaveLambda(func: Callable, *args, restore: Callable | None = None, **kwargs)[source]

Call a lambda with a file descriptor after every epoch. See PeriodicSaveCallback for the arguments’ descriptions.

Parameters:
  • func (Callable[[fd, int, dict], None]) – The lambda that will be called with a file descriptor, the epoch number and the epoch logs.

  • restore (Callable[[fd], None]) – The lambda that will be called with a file descriptor to restore the state if necessary.

See:

PeriodicSaveCallback

LR Schedulers

Poutyne’s callbacks for learning rate schedulers are just wrappers around PyTorch’s learning rate schedulers and thus have the same arguments except for the optimizer that has to be omitted.

class poutyne.LambdaLR(*args, **kwargs)
See:

LambdaLR

class poutyne.MultiplicativeLR(*args, **kwargs)
See:

MultiplicativeLR

class poutyne.StepLR(*args, **kwargs)
See:

StepLR

class poutyne.MultiStepLR(*args, **kwargs)
See:

MultiStepLR

class poutyne.ConstantLR(*args, **kwargs)
See:

ConstantLR

class poutyne.LinearLR(*args, **kwargs)
See:

LinearLR

class poutyne.ExponentialLR(*args, **kwargs)
See:

ExponentialLR

class poutyne.CosineAnnealingLR(*args, **kwargs)
See:

CosineAnnealingLR

class poutyne.CyclicLR(*args, **kwargs)
See:

CyclicLR

class poutyne.OneCycleLR(*args, **kwargs)
See:

OneCycleLR

class poutyne.CosineAnnealingWarmRestarts(*args, **kwargs)
See:

CosineAnnealingWarmRestarts

class poutyne.ReduceLROnPlateau(*args, monitor: str = 'val_loss', **kwargs)[source]
Parameters:

monitor (str) – The quantity to monitor. (Default value = ‘val_loss’)

See:

ReduceLROnPlateau

Policies

The policies module is an alternative way to configure your training process. It gives you fine grained control over the process.

The training is divided into phases with the Phase class. A Phase contains parameter spaces (e.g. learning rate, or momentum, or both) for the optimizer. You chain Phase instances by passing them to the OptimizerPolicy. OptimizerPolicy is a Callback that uses the phases, steps through them, and sets the parameters of the optimizer.

class poutyne.Phase(*, lr: float | None = None, momentum: float | None = None)[source]

Defines how to configure an optimizer.

For each train step it returns a dictionary that contains the configuration for the optimizer.

Parameters:
  • lr (List[float], optional) – a configuration space for the learning rate.

  • momentum (List[float], optional) – a configuration space for the momentum.

class poutyne.OptimizerPolicy(phases: List, *, initial_step: int = 0)[source]

Combine different Phase instances in an OptimizerPolicy and execute the policies in a row.

Parameters:
  • phases (List[Phase]) – A list of Phase instances.

  • initial_step (int) – The step to start the policy in. Used for restarting.

poutyne.linspace(start: int, end: int, steps: int)[source]

A lazy linear parameter space that goes from start to end in steps steps.

Parameters:
  • start (int) – the start point.

  • end (int) – the end point.

  • steps (int) – the number of steps between start and end.

Example

>>> list(linspace(0, 1, 3))
[0.0, 0.5, 1.0]
poutyne.cosinespace(start: int, end: int, steps: int)[source]

A lazy cosine parameter space that goes from start to end in steps steps.

Parameters:
  • start (int) – the start point.

  • end (int) – the end point.

  • steps (int) – the number of steps between start and end.

Example

>>> list(cosinespace(0, 1, 3))
[0.0, 0.5, 1.0]

High Level Policies

Ready to use policies.

poutyne.one_cycle_phases(steps: int, lr: Tuple[float, float] = (0.1, 1), momentum: Tuple[float, float] = (0.95, 0.85), finetune_lr: float = 0.01, finetune_fraction: float = 0.1) List[Phase][source]

The “one-cycle” policy as described in the paper Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.

You might want to read the paper and adjust the parameters.

Parameters:
  • steps (int) – the total number of steps to take.

  • lr (Tuple[float, float]) – tuple for the triangular learning rate (start, middle).

  • momentum (Tuple[float, float]) – tuple for the triangular momentum (start, middle).

  • finetune_lr (float) – target learning rate for the final fine tuning. Should be smaller than min(lr).

  • finetune_fraction (float) – fraction of steps used for the fine tuning. Must be between 0 and 1.

Returns:

A list of configured Phase instances.

References

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

poutyne.sgdr_phases(base_cycle_length: int, cycles: int, lr: Tuple[float, float] = (1.0, 0.1), cycle_mult: int = 2) List[Phase][source]

The “SGDR” policy as described in the paper SGDR: Stochastic Gradient Descent with Warm Restarts.

Note the total number of steps is calculated like this: total_steps = sum(base_cycle_length * (cycle_mult ** i) for i in range(cycles))

You might want to read the paper and adjust the parameters.

Parameters:
  • base_cycle_length (int) – number of steps for the first cycle.

  • cycles (int) – the number of repetitions.

  • lr (Typle[float, float]) – tuple for the learning rate for one cycle: (start, end).

  • cycle_mult (float) – multiply the last cycle length with this every cycle. The length of a cycle grows exponentially.

Returns:

A list of configured Phase instances.

References

SGDR: Stochastic Gradient Descent with Warm Restarts