When Cats meet GANs

A Comprehensive Study on DCGANs and CycleGANs with Advanced Augmentation Techniques

Linji (Joey) Wang

Last updated on Mar 21, 2023 6 min read Project

DCGAN Results

Table of Contents

Introduction

In this assignment, we get hands-on experience coding and training GANs. This assignment includes two parts:

Implementing a Deep Convolutional GAN (DCGAN) to generate grumpy cats from samples of random noise. Implementing a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue) and between apples and oranges.

Part 1: Deep Convolutional GAN

For the first part of this assignment, we implement a slightly modified version of Deep Convolutional GAN (DCGAN).

Experiment with DCGANs

We’ve been experimenting with different data preprocessing techniques, and we’ve found that the choice of preprocessing can have a significant impact on the performance of the GAN. To demonstrate this, we’ve included screenshots of the training loss for both the discriminator and generator with two different preprocessing options: basic, deluxe and diff_aug.

grumpifyBprocessed_basic

sample: data_preprocess=basic, iter = 6400

D_fake_loss: data_preprocess=basic, iter = 6400

D_real_loss: data_preprocess=basic, iter = 6400

D_total_loss: data_preprocess=basic, iter = 6400

G_loss: data_preprocess=basic, iter = 6400

grumpifyBprocessed_deluxe

D_fake_loss: data_preprocess=deluxe, iter = 6400

D_real_loss: data_preprocess=deluxe, iter = 6400

D_total_loss: data_preprocess=deluxe, iter = 6400

G_loss: data_preprocess=deluxe, iter = 6400

data_preprocess=deluxe, iter = 6400, diff_aug = True

grumpifyBprocessed_deluxe_diffaug

D_fake_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True

D_real_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True

D_total_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True

G_loss: data_preprocess=deluxe, iter = 6400, diff_aug = True

Results analysis

Data Preprocessing	Discriminator Loss	Generator Loss	Convergence Rate	Stability
Basic	Slow decrease, potential instability	Fluctuates, struggles to generate realistic images	Slow	Less stable
Deluxe	Faster decrease, more effective at differentiation	Converges more quickly, learns from more varied examples	Faster	More stable
Differential Augmentations	Even faster decrease, more effective at differentiation	Faster generation of diverse and realistic images	Fastest	Most stable

The table above highlights the key differences in the loss curves for a DCGAN trained with different data preprocessing techniques. Basic preprocessing techniques result in slower convergence rates and potentially less stable loss curves, while deluxe techniques result in faster convergence and more stable loss curves. The most effective approach is to use differential augmentations, where different augmentation policies are applied to real and fake images, resulting in the fastest convergence and the most stable loss curves. This analysis suggests that the choice of data preprocessing techniques can have a significant impact on the performance of a GAN, and careful consideration should be given to selecting the most effective approach.

Part 2: CycleGAN

Implemented the CycleGAN architecture.

Data Augmentation

Set the –data_preprocess flag to deluxe.

Generator

Implemented the generator architecture by completing the init method of the CycleGenerator class in models.py.

Experiment with CycleGAN

cat_10deluxe_instance_dc_cycle_naive

Title	Image
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

cat_10deluxe_instance_patch_cycle_naive

Title	Image
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

cat_10deluxe_instance_patch_cycle_naive_cycle

Title	Image
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

cat_10deluxe_instance_patch_cycle_naive_cycle_diffaug

Title	Image
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

apple2orange_10deluxe_instance_patch_cycle_naive_cycle_diffaug

Title	Image
sample X to Y
sample Y to X
D_fake_loss
D_real_loss
D_X_loss
D_Y_loss
G_loss

Observations:

We observed that the results with the cycle-consistency loss were better than the results without it. The translations between the two domains were more accurate and realistic. This is because the cycle-consistency loss enforces the consistency between the two translations, which helps the model to learn better.

We also observed that the DCDiscriminator resulted in better quality translations than the PatchDiscriminator. This is because the DCDiscriminator has a larger receptive field, which enables it to capture more global features of the image.

Conclusion:

In conclusion, we have trained CycleGAN from scratch with and without the cycle-consistency loss, and have compared the results using the DCDiscriminator and the PatchDiscriminator. We have observed that the cycle-consistency loss and the DCDiscriminator resulted in better quality translations between the two domains. These observations can help in improving the translation quality between different domains in image processing applications.

Bells & Whistles

Implement and train a diffusion model

Training Diffusion Models with Hugging Face’s Diffusers

Introduction

In this project, we train a simple diffusion model using the Hugging Face’s Diffusers library. Diffusion models have become state-of-the-art generative models in recent times.

Key Parts of the Code

Configuration:

We define a ‘TrainingConfig’ class that holds all the training hyperparameters. Hyperparameters include ‘image_size’, ’train_batch_size’, ’eval_batch_size’, ’num_epochs’, ‘gradient_accumulation_steps’, ’learning_rate’, and ’lr_warmup_steps’, among others.

Data Preprocessing:

We use the datasets library to load our dataset and apply data transformations. The dataset is preprocessed using the transforms.Compose function from torchvision. The dataset is then transformed on-the-fly during training.

Model Definition:

We define our model using the ‘UNet2DModel’ class from the diffusers library. The model has various hyperparameters such as ‘sample_size’, ‘in_channels’, ‘out_channels’, ’layers_per_block’, ‘block_out_channels’, ‘down_block_types’, and ‘up_block_types’.

Training Setup:

We use an AdamW optimizer and a cosine learning rate schedule for training. We use the DDPMPipeline class from the diffusers library for end-to-end inference during evaluation. The training function train_loop is defined, which includes gradient accumulation, mixed precision training, and multi-GPU or TPU training using the Accelerator class from the accelerate library.

We use the ’notebook_launcher’ function from the accelerate library to launch the training from the notebook.

Key Functions

transform(examples): Applies the image transformations on the fly during training. evaluate(config, epoch, pipeline): Generates a batch of sample images during evaluation and saves them as a grid to the disk. train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler): The main training loop, which includes the forward diffusion process, loss calculation, and backpropagation.

Diffusion Results

Title	Image
Apple
Cat

The quality of the generated images and how well the DCGAN has captured the main differences between the two domains depend on factors such as the quality of the training data, hyperparameters used during training, and complexity of image domains. If the diffusion results look unrealistic compared to the DCGAN results, it could be due to factors such as dataset quality, model complexity, hyperparameter tuning, or training time. Further analysis and experimentation would be necessary to pinpoint the specific reason for the difference in image quality.

Conclusion

This report presents our implementation of DCGAN and CycleGAN for various image generation tasks. Through these experiments, we have observed the impact of data augmentation and differentiable augmentation on the training process and final results. We have also seen the capabilities of CycleGAN in generating realistic images for domain-to-domain translation tasks, such as converting Grumpy cats to Russian Blue cats and vice versa, and converting apples to oranges and vice versa.

Computer Vision Image Generation Deep Learning

When Cats meet GANs

Introduction

Part 1: Deep Convolutional GAN

Experiment with DCGANs

grumpifyBprocessed_basic

grumpifyBprocessed_deluxe

grumpifyBprocessed_deluxe_diffaug

Results analysis

Part 2: CycleGAN

Data Augmentation

Generator

Experiment with CycleGAN

cat_10deluxe_instance_dc_cycle_naive

cat_10deluxe_instance_patch_cycle_naive

cat_10deluxe_instance_patch_cycle_naive_cycle

cat_10deluxe_instance_patch_cycle_naive_cycle_diffaug

apple2orange_10deluxe_instance_patch_cycle_naive_cycle_diffaug

Observations:

Conclusion:

Bells & Whistles

Implement and train a diffusion model

Training Diffusion Models with Hugging Face’s Diffusers

Introduction

Key Parts of the Code

Configuration:

Data Preprocessing:

Model Definition:

Training Setup:

Key Functions

Diffusion Results

Conclusion

Linji (Joey) Wang

PhD Student in AI & Robotics