When Cats meet GANs

A Comprehensive Study on DCGANs and CycleGANs with Advanced Augmentation Techniques

DCGAN Results
Table of Contents

Introduction

In this assignment, we get hands-on experience coding and training GANs. This assignment includes two parts:

Implementing a Deep Convolutional GAN (DCGAN) to generate grumpy cats from samples of random noise. Implementing a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue) and between apples and oranges.

Part 1: Deep Convolutional GAN

For the first part of this assignment, we implement a slightly modified version of Deep Convolutional GAN (DCGAN).

Experiment with DCGANs

We’ve been experimenting with different data preprocessing techniques, and we’ve found that the choice of preprocessing can have a significant impact on the performance of the GAN. To demonstrate this, we’ve included screenshots of the training loss for both the discriminator and generator with two different preprocessing options: basic, deluxe and diff_aug.

grumpifyBprocessed_basic

sample: data_preprocess=basic, iter = 6400
sample: data_preprocess=basic, iter = 6400
D_fake_loss: data_preprocess=basic, iter = 6400
D_fake_loss: data_preprocess=basic, iter = 6400
D_real_loss: data_preprocess=basic, iter = 6400
D_real_loss: data_preprocess=basic, iter = 6400
D_total_loss: data_preprocess=basic, iter = 6400
D_total_loss: data_preprocess=basic, iter = 6400
G_loss: data_preprocess=basic, iter = 6400
G_loss: data_preprocess=basic, iter = 6400

grumpifyBprocessed_deluxe

data_preprocess=deluxe, iter = 6400
data_preprocess=deluxe, iter = 6400
D_fake_loss: data_preprocess=deluxe, iter = 6400
D_fake_loss: data_preprocess=deluxe, iter = 6400
D_real_loss: data_preprocess=deluxe, iter = 6400
D_real_loss: data_preprocess=deluxe, iter = 6400
D_total_loss: data_preprocess=deluxe, iter = 6400
D_total_loss: data_preprocess=deluxe, iter = 6400
G_loss: data_preprocess=deluxe, iter = 6400
G_loss: data_preprocess=deluxe, iter = 6400
data_preprocess=deluxe, iter = 6400, diff_aug = True
data_preprocess=deluxe, iter = 6400, diff_aug = True

grumpifyBprocessed_deluxe_diffaug

D_fake_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True
D_fake_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True
D_real_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True
D_real_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True
D_total_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True
D_total_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True
G_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True
G_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True

Results analysis

Data PreprocessingDiscriminator LossGenerator LossConvergence RateStability
BasicSlow decrease, potential instabilityFluctuates, struggles to generate realistic imagesSlowLess stable
DeluxeFaster decrease, more effective at differentiationConverges more quickly, learns from more varied examplesFasterMore stable
Differential AugmentationsEven faster decrease, more effective at differentiationFaster generation of diverse and realistic imagesFastestMost stable

The table above highlights the key differences in the loss curves for a DCGAN trained with different data preprocessing techniques. Basic preprocessing techniques result in slower convergence rates and potentially less stable loss curves, while deluxe techniques result in faster convergence and more stable loss curves. The most effective approach is to use differential augmentations, where different augmentation policies are applied to real and fake images, resulting in the fastest convergence and the most stable loss curves. This analysis suggests that the choice of data preprocessing techniques can have a significant impact on the performance of a GAN, and careful consideration should be given to selecting the most effective approach.

Part 2: CycleGAN

Implemented the CycleGAN architecture.

Data Augmentation

Set the –data_preprocess flag to deluxe.

Generator

Implemented the generator architecture by completing the init method of the CycleGenerator class in models.py.

Experiment with CycleGAN

cat_10deluxe_instance_dc_cycle_naive

TitleImage
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

cat_10deluxe_instance_patch_cycle_naive

TitleImage
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

cat_10deluxe_instance_patch_cycle_naive_cycle

TitleImage
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

cat_10deluxe_instance_patch_cycle_naive_cycle_diffaug

TitleImage
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

apple2orange_10deluxe_instance_patch_cycle_naive_cycle_diffaug

TitleImage
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

Observations:

We observed that the results with the cycle-consistency loss were better than the results without it. The translations between the two domains were more accurate and realistic. This is because the cycle-consistency loss enforces the consistency between the two translations, which helps the model to learn better.

We also observed that the DCDiscriminator resulted in better quality translations than the PatchDiscriminator. This is because the DCDiscriminator has a larger receptive field, which enables it to capture more global features of the image.

Conclusion:

In conclusion, we have trained CycleGAN from scratch with and without the cycle-consistency loss, and have compared the results using the DCDiscriminator and the PatchDiscriminator. We have observed that the cycle-consistency loss and the DCDiscriminator resulted in better quality translations between the two domains. These observations can help in improving the translation quality between different domains in image processing applications.

Bells & Whistles

Implement and train a diffusion model

Training Diffusion Models with Hugging Face’s Diffusers

Introduction

In this project, we train a simple diffusion model using the Hugging Face’s Diffusers library. Diffusion models have become state-of-the-art generative models in recent times.

Key Parts of the Code

Configuration:

We define a ‘TrainingConfig’ class that holds all the training hyperparameters. Hyperparameters include ‘image_size’, ’train_batch_size’, ’eval_batch_size’, ’num_epochs’, ‘gradient_accumulation_steps’, ’learning_rate’, and ’lr_warmup_steps’, among others.

Data Preprocessing:

We use the datasets library to load our dataset and apply data transformations. The dataset is preprocessed using the transforms.Compose function from torchvision. The dataset is then transformed on-the-fly during training.

Model Definition:

We define our model using the ‘UNet2DModel’ class from the diffusers library. The model has various hyperparameters such as ‘sample_size’, ‘in_channels’, ‘out_channels’, ’layers_per_block’, ‘block_out_channels’, ‘down_block_types’, and ‘up_block_types’.

Training Setup:

We use an AdamW optimizer and a cosine learning rate schedule for training. We use the DDPMPipeline class from the diffusers library for end-to-end inference during evaluation. The training function train_loop is defined, which includes gradient accumulation, mixed precision training, and multi-GPU or TPU training using the Accelerator class from the accelerate library.

We use the ’notebook_launcher’ function from the accelerate library to launch the training from the notebook.

Key Functions

transform(examples): Applies the image transformations on the fly during training. evaluate(config, epoch, pipeline): Generates a batch of sample images during evaluation and saves them as a grid to the disk. train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler): The main training loop, which includes the forward diffusion process, loss calculation, and backpropagation.

Diffusion Results

TitleImage
Apple
Cat

The quality of the generated images and how well the DCGAN has captured the main differences between the two domains depend on factors such as the quality of the training data, hyperparameters used during training, and complexity of image domains. If the diffusion results look unrealistic compared to the DCGAN results, it could be due to factors such as dataset quality, model complexity, hyperparameter tuning, or training time. Further analysis and experimentation would be necessary to pinpoint the specific reason for the difference in image quality.

Conclusion

This report presents our implementation of DCGAN and CycleGAN for various image generation tasks. Through these experiments, we have observed the impact of data augmentation and differentiable augmentation on the training process and final results. We have also seen the capabilities of CycleGAN in generating realistic images for domain-to-domain translation tasks, such as converting Grumpy cats to Russian Blue cats and vice versa, and converting apples to oranges and vice versa.

Linji (Joey) Wang
Linji (Joey) Wang
PhD Student in AI & Robotics