Choosing a ballroom dancing gown for your body type

When selecting the perfect ballroom dancing gown, don’t forget to take your body type into account. This will help you get the most comfort, flexibility on the dance floor and will make you look your…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Teaching Computers to Paint like Van Gogh

The Stanford Campus by AI Van Gogh

One of the most fascinating areas of Deep Learning is, without at doubt, art generation.

Wait…what?

I remember when I first saw this, my mind was blown. Upon close inspection, it could have been a human masterpiece by my standards (not that I know much about art). But it wasn’t.

A Starry Night-Day in Toronto by Alex’s Laptop

It’s actually not as hard as you think.

In a normal Convolutional Neural Net (which we use for object detection, self driving, and facial recognition, etc.), we train it to minimize the error that different parameters of each layer will cause when trying to detect an object or classify an image. If the parameters make the CNN guess wrong, we go back and change them so they’re a little closer to being right the next time.

For NST, we first initialize an image that has randomly generated RGB values for each pixel:

Then, we train the CNN to minimize the error that pixel values will cause based on an arbitrary goal that we decide. So instead of changing the parameters after each backpropagation step to minimize the error, we’re updating the actual pixels of the image.

So, our task is theoretically simple:

We need to write a cost function that can be minimized, where the end result after a certain amount of training iterations is the generated image that combines the style of a Van Gogh painting with the content of the input image, as seen above.

OK, but to do this, we still need to represent style and content mathematically. Let’s first split up writing the cost function into three parts:

Let’s begin.

How do we mathematically represent content? That is to say, how do we make sure that if our content image contains an object, our generated image will also contain a recognizable form of that object where we can easily correlate their features and say: ‘Hey, it’s the same thing, just painted differently’?

This has to do with how a CNN actually learns. Let’s use the powerful VGG-19 Neural Net as an example. We’ll also be using it to apply NST in the end.

VGG-19 won at the ISLVRC-2014 Challenge

A CNN like this works by first detecting very small details from matrices of R, G and B values corresponding to each pixel, like the tiny edges of objects, and going on to detect larger, more sophisticated forms, until it’s able to guess what the image is depicting based on its miniscule components. Here’s a great visualization:

The images above show different features that maximize the activations of each layer. This means that for each layer, each 3x3 box in the grid shows what a neuron is looking for, in a sense.(We apply what’s called an activation function after each layer to separate data we recognize from the data we don’t recognize, hence ‘activations’).

This is where VGG-19 specifically comes in. The 19 layer CNN has been pretrained on the ImageNet dataset, which contains a thousand classes of images and many training images within each class. This makes it more likely for the model to preserve the content of the image, since whatever the content may be, it probably fits into one of the thousand classes, and if not, it’s likely that one of the neurons is looking for something similar to the content object’s components.

Now we just need to run the cost function over all the layers of the CNN, making sure that all representations of the content are present. We can do this because we already developed the intuition that if all activations of a certain layer are similar, the images that are passing through the layer will be similar as well, at their given granularity.

So, there you have it. If you have similar activations in a layer, you have similar content. Now for our cost function, we just need to make sure we check for similar content in each layer as we start changing the style.

It works out to be something like this, If you’re curious:

We try to reduce the normalized sum over differences of all hidden activations in the content and generated images representing their features

Now that we have the content cost, we don’t want our model only to keep modifying for the same content. If that were the case, it would output the same image we gave it.

We want to balance it out with the style of the more 'artistic' image. But the problem here is, how do we extract 'style' mathematically?

If you think about it, the style is really the correlation between certain pieces of content with regards to their location in an image. For example, if vertical strokes exist in the same locations as orange patches, this would represent a characteristic style. If more features correlate, then you can see how we’re able to recreate something like Van Gogh’s style.

So, for our style image, we basically correlate each feature with every other feature at any one layer by multiplying them all together to create a style matrix (or Gram matrix, as it’s otherwise known) for the image. Something like this:

Using what we know about activations from the content cost, the style cost would then reduce the difference between the style matrix in the style image and the style matrix in the generated image, and try to make them closer.

This is the cost function for the matrix at one layer:

But that’s not all. The reason why I keep emphasizing one layer is the following. Suppose we computed the style cost function on layer 1 of the CNN. The features that that layer is detecting would be extremely small; probably not even large enough to be called a ‘style’ by humans. That wouldn’t have the desired outcome we hoped for. What if an artist had larger artistic styles, relating to the symmetry or geometry of certain shapes throughout the image. We wouldn’t be able to reflect that either in our generated image.

That’s why we need to compute a style matrix and a style cost for multiple specific layers in the CNN. Like this:

Summing over all chosen layers style costs multiplied by a weight lambda

This formula allows us to much more comprehensively reflect the style of the artist.

That’s it for style.

The rest is pretty simple. We just have to take in these two functions; the style and the content, and put them into one big function that strikes a reasonable balance between them:

Sum of weight Alpha against content cost and weight Beta against style cost. Alpha and Beta are tuned as hyperparameters

So using these algorithms, NST is executed in the following steps:

Now, looking at what we’ve done and what the authors of the NST paper have shown us is possible, we should really bring this back into perspective.

Even if NST doesn’t have any world changing applications, it’s showing us that the limits of what we think AI can do is being expanded each day. And that makes all the difference.

This is the first in a series of Machine Learning topics that I’m going to be writing about.

Thanks for reading!

Add a comment

Related posts:

15 Terms Everyone in the Fresh Bros Industry Should Know

Table of ContentsThe smart Trick of Delta 10 Distillate That Nobody is DiscussingWhat Is Delta-10-thc And Why Do I Need To Know About It? Fundamentals ExplainedThe Ultimate Guide To Delta 10 Thc…

Decision Trees explained theoretically and Intuitively

Decision trees are one of the most widely used machine learning algorithms in modern world. In much sense, I feel they are like the ‘arrays’ of machine learning. Not a very good analogy. Right? Let…

AFICHES QUE POCOS LEEN

Pego afiches que pocos leen. “AFICHES QUE POCOS LEEN” is published by SERGIO CASSARINO.