Metrics¶
Poutyne offers two kind of metrics: batch metrics and epoch metrics. The main difference between them is that batch metrics are computed at each batch and averaged at the end of an epoch whereas epoch metrics compute statistics for each batch and compute the metric at the end of the epoch.
Epoch metrics offer a way to compute metrics that are not decomposable as an average.
For instance, the predefined F1
epoch metric implements the F1 score.
One who knows what an F1-score may know that an average of multiple F1 scores is not the equivalent of the overall F1 score.
Batch Metrics¶
Batch metrics are computed at each batch and averaged at the end of an epoch.
The interface is the same as PyTorch loss function, that is metric(y_pred, y_true)
.
In addition to the predefined batch metrics below, all PyTorch loss functions can be used by string name in the batch_metrics argument
under their functional name.
The key in callback logs
associated with each of them is the same as its name but without the _loss
suffix. For example, the loss function mse_loss()
can be passed as a batch metric with the name 'mse_loss'
or simply 'mse'
and the keys are going to be 'mse'
and 'val_mse'
for the training and validation MSE, respectively.
Note that you can also pass the PyTorch loss functions as a loss function in Model
in the same way.
Object-Oriented API¶
Below are classes for predefined batch metrics available in Poutyne.
- class poutyne.Accuracy(*, ignore_index: int = - 100, reduction: str = 'mean')[source]¶
This metric computes the accuracy using a similar interface to
CrossEntropyLoss
.- Parameters
ignore_index (int) – Specifies a target value that is ignored and does not contribute to the accuracy. (Default value = -100)
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed.
- Possible string name in
batch_metrics argument
: 'acc'
'accuracy'
- Keys in
logs
dictionary of callbacks: Train:
'acc'
Validation:
'val_acc'
- Shape:
Input: \((N, C)\) where C = number of classes, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional accuracy.
Target: \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional accuracy.
Output: The accuracy.
- class poutyne.BinaryAccuracy(*, threshold: float = 0.0, reduction: str = 'mean')[source]¶
This metric computes the accuracy using a similar interface to
BCEWithLogitsLoss
.- Parameters
threshold (float) – the threshold for class \(1\). Default value is
0.
, that is a probability ofsigmoid(0.) = 0.5
.reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed.
- Possible string name in
batch_metrics argument
: 'bin_acc'
'binary_acc'
'binary_accuracy'
- Keys in
logs
dictionary of callbacks: Train:
'bin_acc'
Validation:
'val_bin_acc'
- Shape:
Input: \((N, *)\) where \(*\) means, any number of additional dimensions
Target: \((N, *)\), same shape as the input
Output: The binary accuracy.
- class poutyne.TopKAccuracy(k: int, *, ignore_index: int = - 100, reduction: str = 'mean')[source]¶
This metric computes the top-k accuracy using a similar interface to
CrossEntropyLoss
.- Parameters
k (int) – Specifies the value of
k
in the top-k accuracy.ignore_index (int) – Specifies a target value that is ignored and does not contribute to the top-k accuracy. (Default value = -100)
reduction (string, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed.
- Possible string name in
batch_metrics argument
: 'top{k}'
'top{k}_acc'
'top{k}_accuracy'
for
{k}
from 1 to 10, 20, 30, …, 100.- Keys in
logs
dictionary of callbacks: Train:
'top{k}'
Validation:
'val_top{k}'
where
{k}
is replaced by the value of parameterk
.- Shape:
Input: \((N, C)\) where C = number of classes, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional top-k accuracy.
Target: \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional top-k accuracy.
Output: The top-k accuracy.
Functional¶
Below is the functional version of the classes in the Object-Oriented API section.
- poutyne.acc(y_pred, y_true, *, ignore_index=- 100, reduction='mean')[source]¶
Computes the accuracy.
This is a functional version of
Accuracy
.See
Accuracy
for details.
- poutyne.bin_acc(y_pred, y_true, *, threshold=0.0, reduction='mean')[source]¶
Computes the binary accuracy.
This is a functional version of
BinaryAccuracy
.See
BinaryAccuracy
for details.
- poutyne.topk(y_pred, y_true, k, *, ignore_index=- 100, reduction='mean')[source]¶
Computes the top-k accuracy.
This is a functional version of
TopKAccuracy
.See
TopKAccuracy
for details.
Epoch Metrics¶
Epoch metrics are metrics calculated only at the end of every epoch. They need to be implemented following the interface class, but we provide an few predefined metrics.
Interface¶
- class poutyne.EpochMetric[source]¶
The abstract class representing a epoch metric which can be accumulated at each batch and calculated at the end of the epoch.
- abstract forward(y_pred, y_true) None [source]¶
To define the behavior of the metric when called.
- Parameters
y_pred – The prediction of the model.
y_true – Target to evaluate the model.
Predefined Epoch Metrics¶
- class poutyne.FBeta(*, metric: Optional[str] = None, average: Union[str, int] = 'micro', beta: float = 1.0, pos_label: int = 1, ignore_index: int = - 100, names: Optional[Union[str, List[str]]] = None)[source]¶
The source code of this class is under the Apache v2 License and was copied from the AllenNLP project and has been modified.
Compute precision, recall, F-measure and support for each class.
The precision is the ratio
tp / (tp + fp)
wheretp
is the number of true positives andfp
the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.The recall is the ratio
tp / (tp + fn)
wheretp
is the number of true positives andfn
the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.
If we have precision and recall, the F-beta score is simply:
F-beta = (1 + beta ** 2) * precision * recall / (beta ** 2 * precision + recall)
The F-beta score weights recall more than precision by a factor of
beta
.beta == 1.0
means recall and precision are equally important.The support is the number of occurrences of each class in
y_true
.- Keys in
logs
dictionary of callbacks: Train:
'{metric}_{average}'
Validation:
'val_{metric}_{average}'
where
{metric}
and{average}
are replaced by the value of their respective parameters.
- Parameters
metric (Optional[str]) – One of {‘fscore’, ‘precision’, ‘recall’}. Whether to return the F-score, the precision or the recall. When not provided, all three metrics are returned. (Default value = None)
One of {‘micro’ (default), ‘macro’, label_number} If the argument is of type integer, the score for this class (the label number) is calculated. Otherwise, this determines the type of averaging performed on the data:
'binary'
:Calculate metrics with regard to a single class identified by the pos_label argument. This is equivalent to average=pos_label except that the binary mode is enforced, i.e. an exception will be raised if there are more than two prediction scores.
'micro'
:Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro'
:Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
(Default value = ‘micro’)
beta (float) – The strength of recall versus precision in the F-score. (Default value = 1.0)
pos_label (int) – The class with respect to which the metric is computed when average == ‘binary’. Otherwise, this argument has no effect. (Default value = 1)
ignore_index (int) – Specifies a target value that is ignored. This also works in combination with a mask if provided. (Default value = -100)
names (Optional[Union[str, List[str]]]) – The names associated to the metrics. It is a string when a single metric is requested. It is a list of 3 strings if all metrics are requested. (Default value = None)
- forward(y_pred: torch.Tensor, y_true: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) None [source]¶
Update the confusion matrix for calculating the F-score.
- Parameters
y_pred (torch.Tensor) – A tensor of predictions of shape (batch_size, num_classes, …).
y_true (Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) – Ground truths. A tensor of the integer class label of shape (batch_size, …). It must be the same shape as the
y_pred
tensor without thenum_classes
dimension. It can also be a tuple with two tensors of the same shape, the first being the ground truths and the second being a mask.
- Keys in
- class poutyne.F1(average='micro')[source]¶
Alias class for
FBeta
wheremetric == 'fscore'
andbeta == 1
.- Possible string name in
batch_metrics argument
: 'f1'
- Keys in
logs
dictionary of callbacks: Train:
'fscore_{average}'
Validation:
'val_fscore_{average}'
where
{average}
is replaced by the value of the respective parameter.
- Possible string name in
- class poutyne.Precision(average='micro')[source]¶
Alias class for
FBeta
wheremetric == 'precision'
andbeta == 1
.- Possible string name in
batch_metrics argument
: 'precision'
- Keys in
logs
dictionary of callbacks: Train:
'precision_{average}'
Validation:
'val_precision_{average}'
where
{average}
is replaced by the value of the respective parameter.
- Possible string name in
- class poutyne.Recall(average='micro')[source]¶
Alias class for
FBeta
wheremetric == 'recall'
andbeta == 1
.- Possible string name in
batch_metrics argument
: 'recall'
- Keys in
logs
dictionary of callbacks: Train:
'recall_{average}'
Validation:
'val_recall_{average}'
where
{average}
is replaced by the value of the respective parameter.
- Possible string name in
- class poutyne.SKLearnMetrics(funcs: Union[Callable, List[Callable]], kwargs: Optional[Union[dict, List[dict]]] = None, names: Optional[Union[str, List[str]]] = None)[source]¶
Wrap metrics with Scikit-learn-like interface (
metric(y_true, y_pred, sample_weight=sample_weight, **kwargs)
). TheSKLearnMetrics
object has to keep in memory the ground truths and predictions so that in can compute the metric at the end.Example
from sklearn.metrics import roc_auc_score, average_precision_score from poutyne import SKLearnMetrics my_epoch_metric = SKLearnMetrics([roc_auc_score, average_precision_score])
- Parameters
funcs (Union[Callable, List[Callable]]) – A metric or a list of metrics with a scikit-learn-like interface.
kwargs (Optional[Union[dict, List[dict]]]) – Optional dictionary of list of dictionaries corresponding to keyword arguments to pass to each corresponding metric. (Default value = None)
names (Optional[Union[str, List[str]]]) – Optional string or list of strings corresponding to the names given to the metrics. By default, the names are the names of the functions.
- forward(y_pred: torch.Tensor, y_true: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) None [source]¶
Accumulate the predictions, ground truths and sample weights if any.
- Parameters
y_pred (torch.Tensor) – A tensor of predictions of the shape expected by the metric functions passed to the class.
y_true (Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]) – Ground truths. A tensor of ground truths of the shape expected by the metric functions passed to the class. It can also be a tuple with two tensors, the first being the ground truths and the second corresponding the
sample_weight
argument passed to the metric functions in Scikit-Learn.