When Cats meet GANs
A Comprehensive Study on DCGANs and CycleGANs with Advanced Augmentation Techniques
Table of Contents
Introduction
In this assignment, we get hands-on experience coding and training GANs. This assignment includes two parts:
Implementing a Deep Convolutional GAN (DCGAN) to generate grumpy cats from samples of random noise. Implementing a more complex GAN architecture called CycleGAN for the task of image-to-image translation. We train the CycleGAN to convert between different types of two kinds of cats (Grumpy and Russian Blue) and between apples and oranges.
Part 1: Deep Convolutional GAN
For the first part of this assignment, we implement a slightly modified version of Deep Convolutional GAN (DCGAN).
Experiment with DCGANs
We’ve been experimenting with different data preprocessing techniques, and we’ve found that the choice of preprocessing can have a significant impact on the performance of the GAN. To demonstrate this, we’ve included screenshots of the training loss for both the discriminator and generator with two different preprocessing options: basic, deluxe and diff_aug.
grumpifyBprocessed_basic
grumpifyBprocessed_deluxe
grumpifyBprocessed_deluxe_diffaug
Results analysis
Data Preprocessing | Discriminator Loss | Generator Loss | Convergence Rate | Stability |
---|---|---|---|---|
Basic | Slow decrease, potential instability | Fluctuates, struggles to generate realistic images | Slow | Less stable |
Deluxe | Faster decrease, more effective at differentiation | Converges more quickly, learns from more varied examples | Faster | More stable |
Differential Augmentations | Even faster decrease, more effective at differentiation | Faster generation of diverse and realistic images | Fastest | Most stable |
The table above highlights the key differences in the loss curves for a DCGAN trained with different data preprocessing techniques. Basic preprocessing techniques result in slower convergence rates and potentially less stable loss curves, while deluxe techniques result in faster convergence and more stable loss curves. The most effective approach is to use differential augmentations, where different augmentation policies are applied to real and fake images, resulting in the fastest convergence and the most stable loss curves. This analysis suggests that the choice of data preprocessing techniques can have a significant impact on the performance of a GAN, and careful consideration should be given to selecting the most effective approach.
Part 2: CycleGAN
Implemented the CycleGAN architecture.
Data Augmentation
Set the –data_preprocess flag to deluxe.
Generator
Implemented the generator architecture by completing the init method of the CycleGenerator class in models.py.
Experiment with CycleGAN
cat_10deluxe_instance_dc_cycle_naive
Title | Image |
---|---|
sample X to Y | |
sample Y to X | |
D_fake_loss | |
D_real_loss | |
D_X_loss | |
D_Y_loss | |
G_loss |
cat_10deluxe_instance_patch_cycle_naive
Title | Image |
---|---|
sample X to Y | |
sample Y to X | |
D_fake_loss | |
D_real_loss | |
D_X_loss | |
D_Y_loss | |
G_loss |
cat_10deluxe_instance_patch_cycle_naive_cycle
Title | Image |
---|---|
sample X to Y | |
sample Y to X | |
D_fake_loss | |
D_real_loss | |
D_X_loss | |
D_Y_loss | |
G_loss |
cat_10deluxe_instance_patch_cycle_naive_cycle_diffaug
Title | Image |
---|---|
sample X to Y | |
sample Y to X | |
D_fake_loss | |
D_real_loss | |
D_X_loss | |
D_Y_loss | |
G_loss |
apple2orange_10deluxe_instance_patch_cycle_naive_cycle_diffaug
Title | Image |
---|---|
sample X to Y | |
sample Y to X | |
D_fake_loss | |
D_real_loss | |
D_X_loss | |
D_Y_loss | |
G_loss |
Observations:
We observed that the results with the cycle-consistency loss were better than the results without it. The translations between the two domains were more accurate and realistic. This is because the cycle-consistency loss enforces the consistency between the two translations, which helps the model to learn better.
We also observed that the DCDiscriminator resulted in better quality translations than the PatchDiscriminator. This is because the DCDiscriminator has a larger receptive field, which enables it to capture more global features of the image.
Conclusion:
In conclusion, we have trained CycleGAN from scratch with and without the cycle-consistency loss, and have compared the results using the DCDiscriminator and the PatchDiscriminator. We have observed that the cycle-consistency loss and the DCDiscriminator resulted in better quality translations between the two domains. These observations can help in improving the translation quality between different domains in image processing applications.
Bells & Whistles
Implement and train a diffusion model
Training Diffusion Models with Hugging Face’s Diffusers
Introduction
In this project, we train a simple diffusion model using the Hugging Face’s Diffusers library. Diffusion models have become state-of-the-art generative models in recent times.
Key Parts of the Code
Configuration:
We define a ‘TrainingConfig’ class that holds all the training hyperparameters. Hyperparameters include ‘image_size’, ’train_batch_size’, ’eval_batch_size’, ’num_epochs’, ‘gradient_accumulation_steps’, ’learning_rate’, and ’lr_warmup_steps’, among others.
Data Preprocessing:
We use the datasets library to load our dataset and apply data transformations. The dataset is preprocessed using the transforms.Compose function from torchvision. The dataset is then transformed on-the-fly during training.
Model Definition:
We define our model using the ‘UNet2DModel’ class from the diffusers library. The model has various hyperparameters such as ‘sample_size’, ‘in_channels’, ‘out_channels’, ’layers_per_block’, ‘block_out_channels’, ‘down_block_types’, and ‘up_block_types’.
Training Setup:
We use an AdamW optimizer and a cosine learning rate schedule for training. We use the DDPMPipeline class from the diffusers library for end-to-end inference during evaluation. The training function train_loop is defined, which includes gradient accumulation, mixed precision training, and multi-GPU or TPU training using the Accelerator class from the accelerate library.
We use the ’notebook_launcher’ function from the accelerate library to launch the training from the notebook.
Key Functions
transform(examples): Applies the image transformations on the fly during training. evaluate(config, epoch, pipeline): Generates a batch of sample images during evaluation and saves them as a grid to the disk. train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler): The main training loop, which includes the forward diffusion process, loss calculation, and backpropagation.
Diffusion Results
Title | Image |
---|---|
Apple | |
Cat |
The quality of the generated images and how well the DCGAN has captured the main differences between the two domains depend on factors such as the quality of the training data, hyperparameters used during training, and complexity of image domains. If the diffusion results look unrealistic compared to the DCGAN results, it could be due to factors such as dataset quality, model complexity, hyperparameter tuning, or training time. Further analysis and experimentation would be necessary to pinpoint the specific reason for the difference in image quality.
Conclusion
This report presents our implementation of DCGAN and CycleGAN for various image generation tasks. Through these experiments, we have observed the impact of data augmentation and differentiable augmentation on the training process and final results. We have also seen the capabilities of CycleGAN in generating realistic images for domain-to-domain translation tasks, such as converting Grumpy cats to Russian Blue cats and vice versa, and converting apples to oranges and vice versa.