Small-GAN- Speeding up GAN training using core-sets

An overview of the paper “Small-GAN- Speeding up GAN training using core-sets”. The author proposes a brief methodology where we extract a smaller batch which mimics the coverage of a larger batch. All images and tables in this post are from their paper.

Introduction

It has previously been studied that training with larger batches yield better results when compared to smaller batches. However, the training on harder batches require very high computational power. Since, such a model is impractical, the authors tried to achieve the same using smaller batches. This experiment was performed on the GAN, and hence, the name. The idea is to extract a smaller batch which mimics the coverage of the larger batch.

Generative Adversarial Networks

A Generative Adversarial Network (or GAN) is a system of two networks trained ‘adversarially’. The generator $G$ , takes input samples from a prior $z \sim p(z)$ and outputs the learned distributions, $G(z)$ . The discriminator, $D$ , receives as input, both the training examples, $X$ , and the synthesized samples, $G(z)$ , and outputs a distribution $D(.)$ over the possible sample source. The discriminator is trained to maximize:

$L_D = -\mathbb{E}_{x \sim P_{data}}[\log D(x)] - \mathbb{E}_{z \sim P_{z}}[\log(1-D(G(z)))]$

while the generator is trained to trick the discriminator by minimizing:

$L_G = -\mathbb{E}_{z \sim P_{z}}[\log D(G(z))]$

Inception score and Frechet Inception Distance

The FID is used to measure the effectiveness of an image synthesis model. This score is derived from using a pre-trained Imagenet classifier, and hence, the name. One further assumption is that the activations of the penultimate layer of the classifier come from a multivariate Gaussian. If the activation on real data are $N(m,C)$ and the activations on the fake data are $N(m_w,C_w)$ , then the FID is defined as:

$\left \| m - m_w \right \|_2^2 + Tr(C + C_w -2(CC_w)^{\frac{1}{2}}))$

Core set Selection

A core set $Q$ , of a set $P$ is a subset $Q$ belongs to $P$ , that approximates the ‘shape’ of $P$ .

Sampling Distributions

Sampling from the prior is relatively simple. We can assume the prior distribution to be an uniform distribution. We do have the freedom to do so as well, so no issues here.

Sampling from the target distribution is more tricky. Taking pairwise distance might not work due to high concentration of images. Furthermore, simple metrics such as eucledian distance do not work as they lack any semantic significance. To avoid these issues, they created Inception embeddings of their data. They projected these embeddings on lower dimensions and applied eucledian pairwise distance to this set. The previous step further reduces the time taken computationally. We can then apply core-set sampling to these representations to select images. Once, we obtain the core-set, we need to apply the inverse embedding function to obtain the original image as well.

The process is task agnostic and could be applied to any variant of GAN (or any similar neural network in my opinion).

Written on June 21, 2020