When Cats meet GANs

A Comprehensive Study on DCGANs and CycleGANs with Advanced Augmentation Techniques

DCGAN Results

Introduction

In this assignment, we get hands-on experience coding and training GANs. This assignment includes two parts:

Implementing a Deep Convolutional GAN (DCGAN) to generate grumpy cats from samples of random noise. Implementing a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue) and between apples and oranges.

Part 1: Deep Convolutional GAN

For the first part of this assignment, we implement a slightly modified version of Deep Convolutional GAN (DCGAN).

Implement Data Augmentation

Implemented the deluxe version of data augmentation in ‘data_loader.py’.

elif opts.data_preprocess == 'deluxe':
        load_size = int(1.1 * opts.image_size)
        osize = [load_size, load_size]
        deluxe_transform = transforms.Compose([
        transforms.Resize(opts.image_size, Image.BICUBIC),
        transforms.RandomCrop(opts.image_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
        ])
        train_transform = deluxe_transform
    pass

Implement the Discriminator of the DCGAN

(Answer for padding calculation goes here)

Implemented the architecture by filling in the ‘init’ and ‘forward’ method of the ‘DCDiscriminator’ class in ‘models.py’.

def __init__(self, conv_dim=64, norm='instance'):
    super().__init__()
    self.conv1 = conv(3, 32, 4, 2, 1, norm, False, 'relu')
    self.conv2 = conv(32, 64, 4, 2, 1, norm, False, 'relu')
    self.conv3 = conv(64, 128, 4, 2, 1, norm, False, 'relu')
    self.conv4 = conv(128, 256, 4, 2, 1, norm, False, 'relu')
    self.conv5 = conv(256, 1, 4, 2, 0, None, False)

def forward(self, x):
    """Forward pass, x is (B, C, H, W)."""
    x = self.conv1(x)
    x = self.conv2(x)
    x = self.conv3(x)
    x = self.conv4(x)
    x = self.conv5(x)
    return x.squeeze()

Generator

Implemented the generator of the DCGAN by filling in the ‘init’ and ‘forward’ method of the ‘DCGenerator’ class in ‘models.py’.

def __init__(self, noise_size, conv_dim=64):
    super().__init__()

    self.up_conv1 = conv(100, 256, 2, 1, 2, 'instance', False,'relu' )
    self.up_conv2 = up_conv(256, 128, 3, stride=1, padding=1, scale_factor=2, norm='instance', activ='relu')
    self.up_conv3 = up_conv(128, 64, 3, stride=1, padding=1, scale_factor=2, norm='instance', activ='relu')
    self.up_conv4 = up_conv(64, 32, 3, stride=1, padding=1, scale_factor=2, norm='instance', activ='relu')
    self.up_conv5 = up_conv(32, 3, 3, stride=1, padding=1, scale_factor=2, norm= None, activ='tanh')

def forward(self, z):
    """
    Generate an image given a sample of random noise.

    Input
    -----
        z: BS x noise_size x 1 x 1   -->  16x100x1x1

    Output
    ------
        out: BS x channels x image_width x image_height  -->  16x3x64x64
    """

    z = self.up_conv1(z)
    z = self.up_conv2(z)
    z = self.up_conv3(z)
    z = self.up_conv4(z)
    z = self.up_conv5(z)
    return z

Training Loop

Implemented the training loop for the DCGAN by filling in the indicated parts of the training_loop function in vanilla_gan.py.

            # TRAIN THE DISCRIMINATOR
            # 1. Compute the discriminator loss on real images
            if opts.use_diffaug:
                D_real_loss = torch.mean((D(DiffAugment(real_images, policy='color,translation,cutout', channels_first=False )) - 1) ** 2)
            else:
                D_real_loss = torch.mean((D(real_images) - 1) ** 2)


            # 2. Sample noise
            noise = sample_noise(opts.batch_size, opts.noise_size)

            # 3. Generate fake images from the noise
            fake_images = G(noise)

            # 4. Compute the discriminator loss on the fake images
            if opts.use_diffaug:

                D_fake_loss = torch.mean((D(DiffAugment(fake_images.detach(), policy='color,translation,cutout', channels_first=False ))) ** 2)
            else:
                D_real_loss = torch.mean((D(fake_images.detach())) ** 2)
            D_total_loss = (D_real_loss + D_fake_loss) / 2

            # update the discriminator D
            d_optimizer.zero_grad()
            D_total_loss.backward()
            d_optimizer.step()

            # TRAIN THE GENERATOR
            # 1. Sample noise
            noise = sample_noise(opts.batch_size, opts.noise_size)

            # 2. Generate fake images from the noise
            fake_images = G(noise)

            # 3. Compute the generator loss
            if opts.use_diffaug:
                G_loss = torch.mean((D(DiffAugment(fake_images, policy='color,translation,cutout', channels_first=False ))-1) ** 2)
            else:
                G_loss = torch.mean((D(fake_images)-1) ** 2)

Differentiable Augmentation

(Discussion of results with and without applying differentiable augmentations, and the difference between two augmentation schemes in terms of implementation and effects)

Experiment with DCGANs

INSERT IMAGE: Screenshots of discriminator and generator training loss with –data_preprocess=basic, –data_preprocess=deluxe.

(Brief explanation of what the curves should look like if GAN manages to train)

INSERT IMAGE: With –data_preprocess=deluxe and differentiable augmentation enabled, show one of the samples from early in training (e.g., iteration 200) and one of the samples from later in training, and give the iteration number for those samples.

(Brief comment on the quality of the samples, and in what way they improve through training)

Part 2: CycleGAN

Implemented the CycleGAN architecture.

Data Augmentation

Set the –data_preprocess flag to deluxe.

Generator

Implemented the generator architecture by completing the init method of the CycleGenerator class in models.py.

def __init__(self, conv_dim=64, init_zero_weights=False, norm='instance'):
    super().__init__()

    # # 1. Define the encoder part of the generator
    self.conv1 = conv(3, 32, 4, 2, 1, norm, False, 'relu')
    self.conv2 = conv(32, 64, 4, 2, 1, norm, False, 'relu')

    # # 2. Define the transformation part of the generator
    self.resnet_block = nn.Sequential(ResnetBlock(conv_dim = 64, norm = norm, activ = 'relu'),
                                      ResnetBlock(conv_dim = 64, norm = norm, activ = 'relu'),
                                      ResnetBlock(conv_dim = 64, norm = norm, activ = 'relu'),)

    # # 3. Define the decoder part of the generator
    self.up_conv1 = up_conv(64, 32, 3, stride=1, padding=1, scale_factor=2, norm='instance', activ='relu')
    self.up_conv2 = up_conv(32, 3, 3, stride=1, padding=1, scale_factor=2, norm= None, activ='tanh')

def forward(self, x):
    """
    Generate an image conditioned on an input image.

    Input
    -----
        x: BS x 3 x 32 x 32

    Output
    ------
        out: BS x 3 x 32 x 32
    """
    x = self.conv1(x)
    x = self.conv2(x)
    x = self.resnet_block(x)
    x = self.up_conv1(x)
    x = self.up_conv2(x)

    return x

Conclusion

This report presents our implementation of DCGAN and CycleGAN for various image generation tasks. Through these experiments, we have observed the impact of data augmentation and differentiable augmentation on the training process and final results. We have also seen the capabilities of CycleGAN in generating realistic images for domain-to-domain translation tasks, such as converting Grumpy cats to Russian Blue cats and vice versa, and converting apples to oranges and vice versa.

Linji (Joey) Wang
Linji (Joey) Wang
PhD Student in AI & Robotics