.. role:: hidden :class: hidden-section Gender Classification and Eyes Location Detection: A Two Task Problem ********************************************************************* .. note:: - See the notebook `here `_ - Run in `Google Colab `_ In this example, we are going to implement a multi-task problem. We try to identify the gender of the people, as well as locating their eyes in the image. Hence, we have two different tasks: classification (to identify the gender) and regression (to find the location of the eyes). We are going to use a single network (a CNN) to perform both tasks, however, we will need to apply different loss functions, each proper to a specific task. For this example we will need to install a few libraries (such as OpenCV, wget). If you don't have them, they can be installed as below: .. code-block:: python %pip install poutyne # to install the Poutyne library %pip install wget # to install the wget library in order to download data %pip install opencv-python # to install the cv2 (opencv) library Let's import all the needed packages. .. code-block:: python import math import os import matplotlib.pyplot as plt import numpy as np import pandas as pd import wget import zipfile import cv2 import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import torchvision.datasets as datasets import torchvision.models as models import torchvision.transforms as tfms from poutyne import set_seeds, Model, ModelCheckpoint, CSVLogger, Experiment, StepLR from torch.utils.data import DataLoader, Subset, Dataset from torchvision.utils import make_grid Training Constants ================== .. code-block:: python num_epochs = 5 learning_rate = 0.01 batch_size = 16 image_size = 224 w, h = 218, 178 # the width and the hight of original images before resizing set_seeds(48) gender_index = 20 # in the CelebA dataset gender information is the 21th item in the attributes vector. W = 0.4 # the weight of regression loss imagenet_mean = [0.485, 0.456, 0.406] # mean of the ImageNet dataset for normalizing imagenet_std = [0.229, 0.224, 0.225] # std of the ImageNet dataset for normalizing device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print("The running processor is...", device) CelebA Dataset ============== We are going to use the CelebA dataset for this experiment. The CelebA dataset is a large-scale face attributes dataset which can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, landmark (or facial part) localization, and face editing & synthesis. Fetching data ============= The section below consists of a few lines of codes that help us download the CelebA dataset from a public web source and unzip it. Downloading the CelebA dataset can be also done directly using `torch.datasets.CelebA(data_root, download=True)`. However, due to the high traffic on the dataset's Google Drive (the main source of the dataset), it usually fails to function. Hence we decided to download it from another public source but use it with `torch.datasets.CelebA()`. .. code-block:: python data_root = "datasets" base_url = "https://graal.ift.ulaval.ca/public/celeba/" file_list = [ "img_align_celeba.zip", "list_attr_celeba.txt", "identity_CelebA.txt", "list_bbox_celeba.txt", "list_landmarks_align_celeba.txt", "list_eval_partition.txt", ] # Path to folder with the dataset dataset_folder = f"{data_root}/celeba" os.makedirs(dataset_folder, exist_ok=True) for file in file_list: url = f"{base_url}/{file}" if not os.path.exists(f"{dataset_folder}/{file}"): wget.download(url, f"{dataset_folder}/{file}") with zipfile.ZipFile(f"{dataset_folder}/img_align_celeba.zip", "r") as ziphandler: ziphandler.extractall(dataset_folder) Now, as the dataset is downloaded, we can define our datasets and dataloaders in its original way. .. code-block:: python transforms = tfms.Compose( [ tfms.Resize((image_size, image_size)), tfms.ToTensor(), tfms.Normalize(imagenet_mean, imagenet_std), ] ) train_dataset = datasets.CelebA(data_root, split="train", target_type=["attr", "landmarks"], transform=transforms) valid_dataset = datasets.CelebA(data_root, split="valid", target_type=["attr", "landmarks"], transform=transforms) test_dataset = datasets.CelebA(data_root, split="test", target_type=["attr", "landmarks"], transform=transforms) Here we can see how each dataset sample looks like: .. code-block:: python print(train_dataset[0]) .. image:: /_static/img/classification_and_regression/out.png Regarding the complexity of the problem and the number of training/valid samples, we have a huge number of training/validation images. Since there are not a considerable variation between images (e.g., the eye coordinates in images do not vary considerably), using all images in the dataset is not necessary and will only increase the training time. Hence, we can seperate and use a portion of data as below: .. code-block:: python train_subset = Subset(train_dataset, np.arange(1, 50000)) valid_subset = Subset(valid_dataset, np.arange(1, 2000)) test_subset = Subset(test_dataset, np.arange(1, 1000)) train_dataloader = DataLoader(train_subset, batch_size=batch_size, shuffle=True) valid_dataloader = DataLoader(valid_subset, batch_size=batch_size, shuffle=False) test_dataloader = DataLoader(test_subset, batch_size=batch_size, shuffle=False) Here, we can see an example from the training dataset. It shows an image of a person, printing the gender and also showing the location of the eyes. It is worth mentioning that as we resize the image, the coordinates of the eyes should also be changed with same ratio. .. code-block:: python sample_number = 3 image = train_dataset[sample_number][0] image = image.permute(1, 2, 0).detach().numpy() image_rgb = cv2.cvtColor(np.float32(image), cv2.COLOR_BGR2RGB) image_rgb = image_rgb * imagenet_std + imagenet_mean gender = "male" if int(train_dataset[sample_number][1][0][gender_index]) == 1 else "female" print("Gender is:", gender) w, h = 218, 178 # The coordinates vector of the datasets starts with X_L, y_L, X_R, y_R (x_L, y_L) = train_dataset[sample_number][1][1][0:2] (x_R, y_R) = train_dataset[sample_number][1][1][2:4] w_scale = image_size / w h_scale = image_size / h x_L, x_R = (x_L * h_scale), (x_R * h_scale) # rescaling for the size of (224,224) and finaly to the range of [0,1] y_L, y_R = (y_L * w_scale), (y_R * w_scale) x_L, x_R = int(x_L), int(x_R) y_L, y_R = int(y_L), int(y_R) image_rgb = cv2.drawMarker(image_rgb, (x_L, y_L), (0, 255, 0)) image_rgb = cv2.drawMarker(image_rgb, (x_R, y_R), (0, 255, 0)) image_rgb = cv2.cvtColor(np.float32(image_rgb), cv2.COLOR_BGR2RGB) image_rgb = np.clip(image_rgb, 0, 1) plt.imshow(image_rgb) plt.axis("off") plt.show() .. image:: /_static/img/classification_and_regression/dataset_sample.png Network ======= Below, we define a new class, named `ClassifierLocalizer`, which accepts a pre-trained CNN and changes its last fully connected layer to be proper for the two task problem. The new fully connected layer contains 6 neurons, 2 for the classification task (male or female) and 4 for the localization task (x and y for the left and right eyes). Moreover, to put the location results on the same scale as the class scores, we apply the sigmoid function to the neurons assigned for the localization task. .. code-block:: python class ClassifierLocalizer(nn.Module): def __init__(self, model_name, num_classes=2): super(ClassifierLocalizer, self).__init__() self.num_classes = num_classes # create cnn model model = getattr(models, model_name)(pretrained=True) # remove fc layers and add a new fc layer num_features = model.fc.in_features model.fc = nn.Linear(num_features, 6) # classifier + localizer self.model = model def forward(self, x): x = self.model(x) # extract features from CNN scores = x[:, : self.num_classes] # class scores coords = x[:, self.num_classes :] # coordinates return scores, torch.sigmoid(coords) # sigmoid output is in the range of [0, 1] .. code-block:: python network = ClassifierLocalizer(model_name='resnet18') Loss function ============= As we discussed before, we have two different tasks in this example. These tasks need different loss functions; Cross-Entropy loss for the classification and Mean Square Error loss for the regression. Below, we define a new loss function class that sums both losses to considers them simultaneously. However, as the regression is relatively a simpler task here (due to similarity of coordinates in the images), we apply a lower weight to MSEloss. .. code-block:: python class ClassificationRegressionLoss(nn.Module): def __init__(self, W): super(ClassificationRegressionLoss, self).__init__() self.ce_loss = nn.CrossEntropyLoss() # size_average=False self.mse_loss = nn.MSELoss() self.W = W def forward(self, y_pred, y_true): loss_cls = self.ce_loss(y_pred[0], y_true[0][:, gender_index]) # Cross Entropy Error (for classification) loss_reg1 = self.mse_loss(y_pred[1][:, 0], y_true[1][:, 0] / h) # Mean Squared Error for X_L loss_reg2 = self.mse_loss(y_pred[1][:, 1], y_true[1][:, 1] / w) # Mean Squared Error for Y_L loss_reg3 = self.mse_loss(y_pred[1][:, 2], y_true[1][:, 2] / h) # Mean Squared Error for X_R loss_reg4 = self.mse_loss(y_pred[1][:, 3], y_true[1][:, 3] / w) # Mean Squared Error for Y_R total_loss = loss_cls + self.W * (loss_reg1 + loss_reg2 + loss_reg3 + loss_reg4) return total_loss Training ======== .. code-block:: python optimizer = optim.SGD(network.parameters(), lr=learning_rate, weight_decay=0.001) loss_function = ClassificationRegressionLoss(W) exp = Experiment( "./saves/two_task_example", network, optimizer=optimizer, loss_function=loss_function, device=device, ) exp.train(train_dataloader, valid_dataloader, epochs=num_epochs) Evaluation ========== As you have also noticed from the training logs, in this try we achieved the best performance (considering the validation loss) at the 15th epoch. The weights of the network for the corresponding epoch have been automatically saved by the `Experiment` function and we use these parameters to evaluate our algorithm visually. For this purpose, we utilize the `load_checkpoint` method and set its argument to `best` to load the best weights of the model automatically. Finally, we take advantage of the `evaluate` function of Poutyne, and apply it to the validation dataset. It provides us the predictions as well as the ground-truth for comparison, in case of need. .. code-block:: python exp.load_checkpoint("best") model = exp.model loss, predictions, ground_truth = model.evaluate_generator(test_dataloader, return_pred=True, return_ground_truth=True) The ``callbacks`` feature of Poutyne, also used by the Experiment class, records the training logs. We can use this information to monitor and analyze the training process. .. code-block:: python logs = pd.read_csv("./saves/two_task_example/log.tsv", sep="\t") print(logs) .. image:: /_static/img/classification_and_regression/logs.png .. code-block:: python train_loss = logs.loss valid_loss = logs.val_loss plt.plot(train_loss) plt.plot(valid_loss) plt.legend(["train_loss", "valid_loss"]) plt.title("training and validation losses") plt.show() .. image:: /_static/img/classification_and_regression/loss_diagram.png We can also evaluate the performance of the trained network (a network with the best weights) on any dataset, as below: .. code-block:: python exp.test(test_dataloader) Now let's evaluate the performance of the network visually. .. code-block:: python sample_number = 123 image = test_subset[sample_number][0] image = image.permute(1, 2, 0).detach().numpy() image_rgb = cv2.cvtColor(np.float32(image), cv2.COLOR_BGR2RGB) image_rgb = image_rgb * imagenet_std + imagenet_mean gender = "male" if np.argmax(predictions[0][sample_number]) == 1 else "female" print("Gender is:", gender) (x_L, y_L) = predictions[1][sample_number][0:2] * image_size (x_R, y_R) = predictions[1][sample_number][2:4] * image_size x_L, x_R = int(x_L), int(x_R) y_L, y_R = int(y_L), int(y_R) image_rgb = cv2.drawMarker(image_rgb, (x_L, y_L), (0, 255, 0)) image_rgb = cv2.drawMarker(image_rgb, (x_R, y_R), (0, 255, 0)) image_rgb = cv2.cvtColor(np.float32(image_rgb), cv2.COLOR_BGR2RGB) image_rgb = np.clip(image_rgb, 0, 1) plt.imshow(image_rgb) plt.axis("off") plt.show() .. image:: /_static/img/classification_and_regression/output_sample.png