A Neural Algorithm of Artistic Style

An overview of the paper “A Neural Algorithm of Artistic Style”. In this paper, the authors propose a novel artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. All images and tables in this post are from their paper.

Introduction

The key finding of this paper is that the representations of content and style in the Convolutional Neural Network are separable. That is, we can manipulate both representations independently to produce new, perceptually meaningful images. This can be explained as follows: when learning object recognition, the network has to become invariant to all image variation that preserves object identity. Representations that factorise the variation in the content of an image and the variation in its appearance would be extremely practical for this task. Thus, our ability to abstract content from style and therefore our ability to create and enjoy art might be primarily a preeminent signature of the powerful inference capabilities of our visual system.

Content Loss

The results presented in this paper were generated by training a VGG Network without any fully connected layers. For image synthesis, replacing the max-pooling operation by average pooling improves the gradient flow and one obtains slightly more appealing results.So let and be the original image and the image that is generated and and their respective feature representation in layer . We define the content loss as: This allows the system to maintatin the semantic similarity with the original image.

Style Loss

To generate a texture that matches the style of a given image, we use gradient descent from a white noise image to find another image that matches the style representation of the original image. This is done by minimising the mean-squared distance between the entries of the Gram matrix from the original image and the Gram matrix of the image to be generated. So let ~a and ~x be the original image and the image that is generated and A_l and G_l their respective style representations in layer l. The contribution of that layer to the total loss is then: , where is number of distinct filters and is the size of each filter. The total style loss is defined as: . The is a hyperparameter in this case.

Combining Losses

The total loss is defined as weighted sum of the content loss and style loss. These weights are also hyperparameters.

Examples of Images that combine content with style of various well-known artworks.

Figure 1

Written on July 17, 2020