Augmenting image landmarks along with images in PyTorch

Lately, I have been working with X-ray images with positional and clinical labels. They look like this:

	x	y	diagnosis
/data/img1.jpg	2.1	5.2	0
/data/img2.jpg	2.6	4.0	1

Its trivial to load these data using pandas (Torch.FloatTensor(pd.read_csv(fp, index_col=0).iloc[0, :])). However – and this is typical for biomedical datasets – my classes are very imbalanced, which means that I need to augment the heck out of it. PyTorch’s built in transformations are no good for this, because they operate on images. We’ll have to implement the augmentation directly using linear transformations.

By the way, I almost used FastAI for this, because they have all of these implemented through the ImagePoints module. But I decided to use PyTorch because my use case was a little unusual and I wanted the flexibility. If your use-case is more standard, I recommend using that because there will be less code to write (and a smaller bug surface).

I’ll start with an obvious one: rotating landmarks along with the image. Normally, I would do something like this:

import torch
import pandas as pd
from torchvision import transforms
from torchvision.models import resnet50
from PIL import Image

class ImageWithLandmarks(torch.utils.data.Dataset):
    def __init__(self, root, labels_fp, img_type="jpg"):
        self.imgs = list(Path(img_dir).glob(f"*.{img_type}"))
        self.landmarks = pd.read_csv(labels_fp, index_col=[0])
        self.preprocess = transforms.Compose([
            transforms.RandomRotation(360),
            transforms.ToTensor(),
        ])

    def __len__(self):
        return len(self.img_dir)

    def __getitem__(self, i):
        img_fp = self.imgs[i]
        img = Image.open(img_fp)
        label = self.landmarks.loc[img_fp,:].values
        # >>> self.landmarks.loc["/data/img1".jpg,:].values
        # np.array([2.1, 5.2], np.float32)
        if self.training:
            img = img.rotate(random.random() * 360.0)  # <-
        return self.preprocess(img), torch.FloatTensor(label)

However, this won’t work – because our landmarks will not rotate with our image like so:

The solution is to construct a rotation matrix for some specific angle, $\theta$. Since this is just linear algebra, we can easily vectorize this in numpy.

def rotate(x, pivot, theta):
    """rotate a vector, x, by theta (radians), around pivot vector

    >>> rotate(np.array([0,1]).T, theta=np.pi/2)
    np.array([1, 0])

    Arguments:
        x {np.array} : vector to be rotated
        pivot {np.array} : pivot of the rotation
        theta {float} : amount of rotation

    Returns: rotated vector, x {np.array}
    """
    rotation_mat = np.array([
        [np.cos(theta), -np.sin(theta)],
        [np.sin(theta),  np.cos(theta)]
    ])
    # note: @ is the matrix multiplication operator in modern numpy
    return (rotation_mat @ (x - pivot)) + pivot

You will notice there is a pivot argument. Because PIL (and, by extension, PyTorch) rotate with the center of the image as the pivot, we need to translate our coordinate system before applying the rotation.

In all, the preprocessing step becomes:

# at the top of the file...
import numpy as np
import random
# ... in the constructor ...
self.preprocess = transforms.ToTensor()
# ... in the get_item method
if self.training:
    r = random.random()
    rot_degs = r * 360.0
    rot_rads = r * 2 * np.pi
    img = img.rotate(rot_degs)
    pivot = np.array(img.size) / 2 # pivot around the image center
    label = rotate(label, pivot, rot_rads)

Now, we get this:

Much better.

Final notes

We don’t have to compute the rotation matrix every trial. In fact, I think this is a little more error-prone, not to mention computationally expensive (although premature optimization is the root of all evil). We can precompute a set of rotation matrices like so:

# precompute 100 rotation matrices
rotations = [
    (theta, theta*180/np.pi, np.array([
        [np.cos(theta), -np.sin(theta)],
        [np.sin(theta),  np.cos(theta)]
    ])
    for theta in np.linspace(0, 2*np.pi, 100)
]

In all, the dataloader would be implemented like this:

def __getitem__(self, i):
    img_fp = self.imgs[i]
    img = Image.open(img_fp)
    label = self.landmarks.loc[img_fp,:].values
    if self.training:
        rotation_rads, rotation_degs, rotation_mat = random.choice(rotations)
        img = img.rotate(rotation_degs)
        pivot = np.array(img.size) / 2
        label = (rotation_mat @ (label - pivot)) + pivot
    return self.preprocess(img), torch.FloatTensor(label)

If you do it this way, you can easily compose rotations with other linear transformations. Here are a couple:

Scaling: scale(a,b) $\to \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}$

Translation: translate(c,d) $\to \begin{bmatrix} c & d \\ \end{bmatrix}^T$

Shear: shear(e) $\to \begin{bmatrix} 1 & e \\ 0 & 1 \end{bmatrix}$

These are provided for images in PyTorch transforms through the affine transform (torchvision.transforms.functional.affine).

Thus, img = affine(img, angle=theta, translate=(a,b), scale=(c,d), shear=e) corresponds to the following code for a positional label:

rot_mat = np.array([[np.cos(theta), -np.sin(theta)],
                    [np.sin(theta),  np.cos(theta)]])
scale_mat = np.array([[a, 0],
                      [0, b]])
shear_mat = np.array([[1, m],
                      [0, 1]])
translation_mat = np.array([a,b]).T
label = ((rot_mat
          @ scale_mat
          @ shear_mat
          @ (label - pivot))
        + pivot
        + translation_mat)

Source Code

The code for generating the visualizations above are available here.

References:

X-ray image by Sudraben - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=75028598
Thanks celluloid for the awesome interface for matplotlib animations!