Metrics

Poutyne offers two kind of metrics: batch metrics and epoch metrics. The main difference between them is that batch metrics are computed at each batch and averaged at the end of an epoch whereas epoch metrics compute statistics for each batch and compute the metric at the end of the epoch.

Epoch metrics offer a way to compute metrics that are not decomposable as an average. For instance, the predefined F1 epoch metric implements the F1 score. One who knows what an F1-score may know that an average of multiple F1 scores is not the equivalent of the overall F1 score.

Batch Metrics

Batch metrics are computed at each batch and averaged at the end of an epoch. The interface is the same as PyTorch loss function, that is metric(y_pred, y_true).

In addition to the predefined batch metrics below, all PyTorch loss functions can be used by string name in the batch_metrics argument under their functional name. The key in callback logs associated with each of them is the same as its name but without the _loss suffix. For example, the loss function mse_loss() can be passed as a batch metric with the name 'mse_loss' or simply 'mse' and the keys are going to be 'mse' and 'val_mse' for the training and validation MSE, respectively. Note that you can also pass the PyTorch loss functions as a loss function in Model in the same way.

Object-Oriented API

Below are classes for predefined batch metrics available in Poutyne.

class poutyne.Accuracy(*, ignore_index: int = - 100, reduction: str = 'mean')[source]

This metric computes the accuracy using a similar interface to CrossEntropyLoss.

Parameters
  • ignore_index (int) – Specifies a target value that is ignored and does not contribute to the accuracy. (Default value = -100)

  • reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed.

Possible string name in batch_metrics argument:
  • 'acc'

  • 'accuracy'

Keys in logs dictionary of callbacks:
  • Train: 'acc'

  • Validation: 'val_acc'

Shape:
  • Input: \((N, C)\) where C = number of classes, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional accuracy.

  • Target: \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional accuracy.

  • Output: The accuracy.

class poutyne.BinaryAccuracy(*, threshold: float = 0.0, reduction: str = 'mean')[source]

This metric computes the accuracy using a similar interface to BCEWithLogitsLoss.

Parameters
  • threshold (float) – the threshold for class \(1\). Default value is 0., that is a probability of sigmoid(0.) = 0.5.

  • reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed.

Possible string name in batch_metrics argument:
  • 'bin_acc'

  • 'binary_acc'

  • 'binary_accuracy'

Keys in logs dictionary of callbacks:
  • Train: 'bin_acc'

  • Validation: 'val_bin_acc'

Shape:
  • Input: \((N, *)\) where \(*\) means, any number of additional dimensions

  • Target: \((N, *)\), same shape as the input

  • Output: The binary accuracy.

class poutyne.TopKAccuracy(k: int, *, ignore_index: int = - 100, reduction: str = 'mean')[source]

This metric computes the top-k accuracy using a similar interface to CrossEntropyLoss.

Parameters
  • k (int) – Specifies the value of k in the top-k accuracy.

  • ignore_index (int) – Specifies a target value that is ignored and does not contribute to the top-k accuracy. (Default value = -100)

  • reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed.

Possible string name in batch_metrics argument:
  • 'top{k}'

  • 'top{k}_acc'

  • 'top{k}_accuracy'

for {k} from 1 to 10, 20, 30, …, 100.

Keys in logs dictionary of callbacks:
  • Train: 'top{k}'

  • Validation: 'val_top{k}'

where {k} is replaced by the value of parameter k.

Shape:
  • Input: \((N, C)\) where C = number of classes, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional top-k accuracy.

  • Target: \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional top-k accuracy.

  • Output: The top-k accuracy.

Functional

Below is the functional version of the classes in the Object-Oriented API section.

poutyne.acc(y_pred, y_true, *, ignore_index=- 100, reduction='mean')[source]

Computes the accuracy.

This is a functional version of Accuracy.

See Accuracy for details.

poutyne.bin_acc(y_pred, y_true, *, threshold=0.0, reduction='mean')[source]

Computes the binary accuracy.

This is a functional version of BinaryAccuracy.

See BinaryAccuracy for details.

poutyne.topk(y_pred, y_true, k, *, ignore_index=- 100, reduction='mean')[source]

Computes the top-k accuracy.

This is a functional version of TopKAccuracy.

See TopKAccuracy for details.

Epoch Metrics

Epoch metrics are metrics calculated only at the end of every epoch. They need to be implemented following the interface class, but we provide an few predefined metrics.

Interface

class poutyne.EpochMetric[source]

The abstract class representing a epoch metric which can be accumulated at each batch and calculated at the end of the epoch.

abstract forward(y_pred, y_true) None[source]

To define the behavior of the metric when called.

Parameters
  • y_pred – The prediction of the model.

  • y_true – Target to evaluate the model.

abstract get_metric()[source]

Compute and return the metric. Should not modify the state of the epoch metric.

abstract reset() None[source]

The information kept for the computation of the metric is cleaned so that a new epoch can be done.

Predefined Epoch Metrics

class poutyne.FBeta(*, metric: Optional[str] = None, average: Union[str, int] = 'micro', beta: float = 1.0, pos_label: int = 1, ignore_index: int = - 100, names: Optional[Union[str, List[str]]] = None)[source]

The source code of this class is under the Apache v2 License and was copied from the AllenNLP project and has been modified.

Compute precision, recall, F-measure and support for each class.

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.

If we have precision and recall, the F-beta score is simply: F-beta = (1 + beta ** 2) * precision * recall / (beta ** 2 * precision + recall)

The F-beta score weights recall more than precision by a factor of beta. beta == 1.0 means recall and precision are equally important.

The support is the number of occurrences of each class in y_true.

Keys in logs dictionary of callbacks:
  • Train: '{metric}_{average}'

  • Validation: 'val_{metric}_{average}'

where {metric} and {average} are replaced by the value of their respective parameters.

Parameters
  • metric (Optional[str]) – One of {‘fscore’, ‘precision’, ‘recall’}. Whether to return the F-score, the precision or the recall. When not provided, all three metrics are returned. (Default value = None)

  • average (Union[str, int]) –

    One of {‘micro’ (default), ‘macro’, label_number} If the argument is of type integer, the score for this class (the label number) is calculated. Otherwise, this determines the type of averaging performed on the data:

    'binary':

    Calculate metrics with regard to a single class identified by the pos_label argument. This is equivalent to average=pos_label except that the binary mode is enforced, i.e. an exception will be raised if there are more than two prediction scores.

    'micro':

    Calculate metrics globally by counting the total true positives, false negatives and false positives.

    'macro':

    Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

    (Default value = ‘micro’)

  • beta (float) – The strength of recall versus precision in the F-score. (Default value = 1.0)

  • pos_label (int) – The class with respect to which the metric is computed when average == ‘binary’. Otherwise, this argument has no effect. (Default value = 1)

  • ignore_index (int) – Specifies a target value that is ignored. This also works in combination with a mask if provided. (Default value = -100)

  • names (Optional[Union[str, List[str]]]) – The names associated to the metrics. It is a string when a single metric is requested. It is a list of 3 strings if all metrics are requested. (Default value = None)

forward(y_pred: torch.Tensor, y_true: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) None[source]

Update the confusion matrix for calculating the F-score.

Parameters
  • y_pred (torch.Tensor) – A tensor of predictions of shape (batch_size, num_classes, …).

  • y_true (Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) – Ground truths. A tensor of the integer class label of shape (batch_size, …). It must be the same shape as the y_pred tensor without the num_classes dimension. It can also be a tuple with two tensors of the same shape, the first being the ground truths and the second being a mask.

get_metric() Union[float, List[float]][source]

Returns either a float if a single metric is set in the __init__ or a list of floats [f-score, precision, recall] if all metrics are requested.

reset() None[source]

The information kept for the computation of the metric is cleaned so that a new epoch can be done.

class poutyne.F1(average='micro')[source]

Alias class for FBeta where metric == 'fscore' and beta == 1.

Possible string name in batch_metrics argument:
  • 'f1'

Keys in logs dictionary of callbacks:
  • Train: 'fscore_{average}'

  • Validation: 'val_fscore_{average}'

where {average} is replaced by the value of the respective parameter.

class poutyne.Precision(average='micro')[source]

Alias class for FBeta where metric == 'precision' and beta == 1.

Possible string name in batch_metrics argument:
  • 'precision'

Keys in logs dictionary of callbacks:
  • Train: 'precision_{average}'

  • Validation: 'val_precision_{average}'

where {average} is replaced by the value of the respective parameter.

class poutyne.Recall(average='micro')[source]

Alias class for FBeta where metric == 'recall' and beta == 1.

Possible string name in batch_metrics argument:
  • 'recall'

Keys in logs dictionary of callbacks:
  • Train: 'recall_{average}'

  • Validation: 'val_recall_{average}'

where {average} is replaced by the value of the respective parameter.

class poutyne.SKLearnMetrics(funcs: Union[Callable, List[Callable]], kwargs: Optional[Union[dict, List[dict]]] = None, names: Optional[Union[str, List[str]]] = None)[source]

Wrap metrics with Scikit-learn-like interface (metric(y_true, y_pred, sample_weight=sample_weight, **kwargs)). The SKLearnMetrics object has to keep in memory the ground truths and predictions so that in can compute the metric at the end.

Example

from sklearn.metrics import roc_auc_score, average_precision_score
from poutyne import SKLearnMetrics
my_epoch_metric = SKLearnMetrics([roc_auc_score, average_precision_score])
Parameters
  • funcs (Union[Callable, List[Callable]]) – A metric or a list of metrics with a scikit-learn-like interface.

  • kwargs (Optional[Union[dict, List[dict]]]) – Optional dictionary of list of dictionaries corresponding to keyword arguments to pass to each corresponding metric. (Default value = None)

  • names (Optional[Union[str, List[str]]]) – Optional string or list of strings corresponding to the names given to the metrics. By default, the names are the names of the functions.

forward(y_pred: torch.Tensor, y_true: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) None[source]

Accumulate the predictions, ground truths and sample weights if any.

Parameters
  • y_pred (torch.Tensor) – A tensor of predictions of the shape expected by the metric functions passed to the class.

  • y_true (Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) – Ground truths. A tensor of ground truths of the shape expected by the metric functions passed to the class. It can also be a tuple with two tensors, the first being the ground truths and the second corresponding the sample_weight argument passed to the metric functions in Scikit-Learn.

get_metric() Dict[source]

Returns the metrics as a dictionary with the names as keys. Note: This will reset the epoch metric value.

reset() None[source]

The information kept for the computation of the metric is cleaned so that a new epoch can be done.