GAN Photo Editing

A Journey Through Generative AI Techniques

Style Interpolation
Table of Contents

Introduction

In this assignment, we implement a few different techniques that require manipulating images on the manifold of natural images.

  • First, we invert a pre-trained generator to find a latent variable that closely reconstructs a given real image.
  • In the second part of the assignment, we take a hand-drawn sketch and generate an image that fits the sketch accordingly.

Setup

To set up the environment for this project, create a new virtual environment and install the required dependencies:

conda create -n 16726_hw5
conda activate 16726_hw5
pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install click requests tqdm pyspng ninja matplotlib imageio imageio-ffmpeg==0.4.3
pip install wandb # weight and bias is used in this blog for logging experiments.

Part 1: Inverting the Generator [30 pts]

In the first part of the assignment, we solve an optimization problem to reconstruct the image from a particular latent code. We use different combinations of loss functions, generative models, and latent spaces to find the best result.

Implementation Details

  1. Implement the forward function in the Criterion class.
  2. Implement sample_noise for StyleGAN2, including w and w+.
  3. Implement the optimization step using LBFGS or other optimizers.
  4. Implement the whole functionality in project().

Deliverables

Show example outputs of image reconstruction efforts and provide comments on why the various outputs look how they do.

  • Various combinations of the losses including Lp loss, Preceptual loss and/or regularization loss that penalizes L2 norm of delta.
  • different generative models including vanilla GAN, StyleGAN
  • different latent space (latent code in z space, w space, and w+ space)
L1 LossPerceptual LossRegularization LossModelLatent SpaceResults
ONONONVanilla GANz
ONONOFFVanilla GANz
ONOFFONVanilla GANz
ONOFFOFFVanilla GANz
OFFONONVanilla GANz
OFFONOFFVanilla GANz
ONONONStyleGANz
ONONOFFStyleGANz
ONOFFONStyleGANz
ONOFFOFFStyleGANz
OFFONONStyleGANz
OFFONOFFStyleGANz
ONONONStyleGANw
ONONOFFStyleGANw
ONOFFONStyleGANw
ONOFFOFFStyleGANw
OFFONONStyleGANw
OFFONOFFStyleGANw
ONONONStyleGANw+
ONONOFFStyleGANw+
ONOFFONStyleGANw+
ONOFFOFFStyleGANw+
OFFONONStyleGANw+
OFFONOFFStyleGANw+

Our experiments compared GAN architectures and loss functions to assess their impact on generated images. We found that StyleGAN with L1 loss, Perceptual loss, and Regularization loss consistently delivered superior results, generating high-quality images closely resembling the target distribution.

We observed challenges in training without Perceptual loss, resulting in less stable training processes. In contrast, Vanilla GAN generated plausible images but lacked the fine-grained detail present in StyleGAN outputs.

In conclusion, StyleGAN combined with L1, Perceptual, and Regularization losses outperformed other configurations, demonstrating its effectiveness in generating high-quality, detailed images.

Part 2: Interpolate your Cats [10 pts]

In this part, we perform interpolation between latent vectors found in Part 1 using different generative models and latent spaces.

Implementation

  1. Implement the interpolation step in interpolate().

Deliverables

Show a few interpolations between grumpy cats and comment on the quality of the images and the interpolation process.

We first generate in the interpolation in 64 by 64.

But we found 64 by 64 resolution is not enough for website view experience. So we edit part of the code to enable higher resolution (512 by 512).

And I tried to test the interpolation on some cute cats images.

Part 3: Scribble to Image [40 Points]

In this part, we generate an image subject to constraints, like color scribble constraints, using a penalized nonconvex optimization problem.

Implementation Details

  1. Implement the code for synthesizing images from drawings to realistic ones using the optimization procedure in draw().

Deliverables

Draw some cats and experiment with sparser and denser sketches and the use of color. Show example outputs along with commentary on what seems to have happened and why.

TargetResults

I also add some DIY mask for fun.

TargetResults

Our recent experiments explored the capability of our implementation to generate intriguing images that closely conform to the target drawing masks. The results showed that our approach successfully produced visually appealing images that adhered to the provided masks.

However, we observed that some generated images appeared dark due to the darkness of the corresponding masks. While the overall results were impressive, it’s worth noting that the mask’s darkness can impact the final image’s brightness and contrast.

In conclusion, our experiments demonstrated that our implementation could effectively generate interesting images that align with the target drawing masks, though the mask’s darkness may influence the resulting image’s appearance.

Conlusions

Throughout this project, we have investigated various aspects of image generation using GAN architectures. 🖼️

Part 1 focused on comparing different GAN architectures and loss functions, where StyleGAN with L1 loss, Perceptual loss, and Regularization loss proved to be the most effective in generating high-quality, detailed images. We also observed challenges in training without Perceptual loss 😅, and found that Vanilla GAN could not match the fine-grained detail present in StyleGAN outputs.

In Part 2, we showcased several interpolations between grumpy cat images 🐱, initially at a resolution of 64x64, which was later increased to 512x512 for a better web viewing experience. The interpolations demonstrated the smooth transitions between images and highlighted the potential of high-resolution image generation 🌟.

Part 3 explored generating intriguing images that closely conform to target drawing masks ✍️. While our implementation successfully produced visually appealing images, we observed that the mask’s darkness could impact the final image’s brightness and contrast.

Overall, this project has demonstrated the power of GANs, particularly StyleGAN, in generating high-quality images and interpolations 🔥. We have also shown the potential of using target drawing masks for image generation, opening up possibilities for further exploration and improvement in image manipulation techniques 💡.

Linji (Joey) Wang
Linji (Joey) Wang
PhD Student in AI & Robotics