.. role:: hidden :class: hidden-section Transfer learning example ************************* .. note:: - See the notebook `here `_ - Run in `Google Colab `_ But first, let's import all the needed packages. .. code-block:: python import os import tarfile import urllib.request import matplotlib.pyplot as plt import numpy as np import pandas as pd from sklearn.model_selection import train_test_split import torch import torch.nn as nn import torch.optim as optim import torchvision.models as models from torch.utils import model_zoo from torch.utils.data import Subset, DataLoader from torchvision import transforms from torchvision.datasets import ImageFolder from poutyne import set_seeds, Model, ModelCheckpoint, CSVLogger, Experiment Also, we need to set Pythons's, NumPy's and PyTorch's seeds by using Poutyne function so that our training is (almost) reproducible. .. code-block:: python set_seeds(42) We download the dataset. .. code-block:: python def download_and_extract_dataset(path): os.makedirs(path, exist_ok=True) tgz_filename = os.path.join(path, "images.tgz") print("Downloading dataset...") urllib.request.urlretrieve("https://graal.ift.ulaval.ca/public/CUB200.tgz", tgz_filename) print("Extracting archive...") archive = tarfile.open(tgz_filename) archive.extractall(path) .. code-block:: python base_path = './datasets/CUB200' extract_dest = os.path.join(base_path, 'images') download_and_extract_dataset(base_path) We create our dataset object. .. code-block:: python norm_coefs = {} norm_coefs['cub200'] = [(0.47421962, 0.4914721 , 0.42382449), (0.22846779, 0.22387765, 0.26495799)] norm_coefs['imagenet'] = [(0.485, 0.456, 0.406), (0.229, 0.224, 0.225)] transform = transforms.Compose([ transforms.Resize((224,224)), transforms.ToTensor(), transforms.Normalize(*norm_coefs['cub200']) ]) # is_valid_file removes hidden files from the dataset. dataset = ImageFolder(extract_dest, transform=transform, is_valid_file=lambda path: not os.path.split(path)[1].startswith('.')) We do the split train/valid/test with a 60/20/20 split respectively. We do a *stratified* split with scikit-learn in order to get examples of every class in every split. .. code-block:: python # We take 60% of the dataset for the training dataset train_indices, valid_test_indices = train_test_split(np.arange(len(dataset)), train_size=0.6, stratify=dataset.targets, random_state=42) # We take 20% for the validation dataset and 20% for the test dataset # (i.e. 50% of the remaining 40%). valid_indices, test_indices = train_test_split(valid_test_indices, train_size=0.5, stratify=np.asarray(dataset.targets)[valid_test_indices], random_state=42) train_dataset = Subset(dataset, train_indices) valid_dataset = Subset(dataset, valid_indices) test_dataset = Subset(dataset, test_indices) Now, let's set our training constants. We first have the CUDA device used for training if one is present. Second, we set the number of classes (i.e. one for each number). Finally, we set the batch size (i.e. the number of elements to see before updating the model), the learning rate for the optimizer, and the number of epochs (i.e. the number of times we see the full dataset). .. code-block:: python cuda_device = 0 device = torch.device("cuda:%d" % cuda_device if torch.cuda.is_available() else "cpu") num_classes = 200 batch_size = 32 learning_rate = 0.1 n_epoch = 30 Creation of the PyTorch's dataloader to split our data into batches. .. code-block:: python train_loader = DataLoader(train_dataset, batch_size=batch_size, num_workers=8, shuffle=True) valid_loader = DataLoader(valid_dataset, batch_size=batch_size, num_workers=8) test_loader = DataLoader(test_dataset, batch_size=batch_size, num_workers=8) We load a pretrained ``ResNet-18`` networks and replace the head with the number of neurons equal to our number of classes. .. code-block:: python resnet18 = models.resnet18(pretrained=True) resnet18.fc = nn.Linear(resnet18.fc.in_features, num_classes) We freeze the network except for its head. .. code-block:: python def freeze_weights(resnet18): for name, param in resnet18.named_parameters(): if not name.startswith('fc.'): param.requires_grad = False freeze_weights(resnet18) We define callbacks for saving last epoch, best epoch and logging the results. .. code-block:: python # We are saving everything into ./saves/cub200. save_base_dir = 'saves' save_path = os.path.join(save_base_dir, 'cub200') os.makedirs(save_path, exist_ok=True) callbacks = [ # Save the latest weights to be able to resume the optimization at the end for more epochs. ModelCheckpoint(os.path.join(save_path, 'last_epoch.ckpt')), # Save the weights in a new file when the current model is better than all previous models. ModelCheckpoint(os.path.join(save_path, 'best_epoch_{epoch}.ckpt'), monitor='val_acc', mode='max', save_best_only=True, restore_best=True, verbose=True), # Save the losses and accuracies for each epoch in a TSV. CSVLogger(os.path.join(save_path, 'log.tsv'), separator='\t'), ] Finally, we start the training and output its final test loss, accuracy, and micro F1-score. .. Note:: The F1-score is quite similar to the accuracy since the dataset is very balanced. .. code-block:: python optimizer = optim.SGD(resnet18.fc.parameters(), lr=learning_rate, weight_decay=0.001) loss_function = nn.CrossEntropyLoss() model = Model(resnet18, optimizer, loss_function, batch_metrics=['accuracy'], epoch_metrics=['f1'], device=device) model.fit_generator(train_loader, valid_loader, epochs=n_epoch, callbacks=callbacks) test_loss, (test_acc, test_f1) = model.evaluate_generator(test_loader) print('Test:\n\tLoss: {}\n\tAccuracy: {}\n\tF1-score: {}'.format(test_loss, test_acc, test_f1)) .. code-block:: python logs = pd.read_csv(os.path.join(save_path, 'log.tsv'), sep='\t') print(logs) best_epoch_idx = logs['val_acc'].idxmax() best_epoch = int(logs.loc[best_epoch_idx]['epoch']) print("Best epoch: %d" % best_epoch) .. image:: /_static/img/transfer_learning/logs.png .. image:: /_static/img/transfer_learning/best.png .. code-block:: python metrics = ['loss', 'val_loss'] plt.plot(logs['epoch'], logs[metrics]) plt.legend(metrics) plt.show() .. image:: /_static/img/transfer_learning/loss.png .. code-block:: python metrics = ['acc', 'val_acc'] plt.plot(logs['epoch'], logs[metrics]) plt.legend(metrics) plt.show() .. image:: /_static/img/transfer_learning/acc.png Since we have created checkpoints using callbacks, we can restore the best model from those checkpoints and test it. .. code-block:: python resnet18 = models.resnet18(pretrained=False, num_classes=num_classes) model = Model(resnet18, 'sgd', 'cross_entropy', batch_metrics=['accuracy'], epoch_metrics=['f1'], device=device) model.load_weights(os.path.join(save_path, 'best_epoch_{epoch}.ckpt').format(epoch=best_epoch)) test_loss, (test_acc, test_f1) = model.evaluate_generator(test_loader) print('Test:\n\tLoss: {}\n\tAccuracy: {}\n\tF1-score: {}'.format(test_loss, test_acc, test_f1)) We can also use the :class:`~poutyne.Experiment` class to train our network. This class saves checkpoints and logs as above in a directory and allows to stop and resume optimization at will. See :class:`documentation ` for details. .. code-block:: python def experiment_train(epochs): # Reload the pretrained network and freeze it except for its head. resnet18 = models.resnet18(pretrained=True) resnet18.fc = nn.Linear(resnet18.fc.in_features, num_classes) freeze_weights(resnet18) # Saves everything into ./saves/cub200_resnet18_experiment save_path = os.path.join(save_base_dir, 'cub200_resnet18_experiment') optimizer = optim.SGD(resnet18.fc.parameters(), lr=learning_rate, weight_decay=0.001) # Poutyne Experiment exp = Experiment(save_path, resnet18, device=device, optimizer=optimizer, task='classif') # Train exp.train(train_loader, valid_loader, epochs=epochs) # Test exp.test(test_loader) Let's train for 5 epochs. .. code-block:: python experiment_train(epochs=5) Let's train for 5 more epochs (10 epochs total). .. code-block:: python experiment_train(epochs=10)