You might not think that programmers are artists, but programming is an extremely creative profession. It’s logic-based creativity. - John Romero
Generative adversarial networks (GANs) are algorithmic architectures that use two neural networks, pitting one against the other (thus the “adversarial”) in order to generate new, synthetic instances of data that can pass for real data. They are used widely in image generation, video generation and voice generation.
GANs were introduced in a paper by Ian Goodfellow and other researchers at the University of Montreal, including Yoshua Bengio, in 2014. Referring to GANs, Facebook’s AI research director Yann LeCun called adversarial training “the most interesting idea in the last 10 years in ML.”
GANs’ potential for both good and evil is huge, because they can learn to mimic any distribution of data. That is, GANs can be taught to create worlds eerily similar to our own in any domain: images, music, speech, prose. They are robot artists in a sense, and their output is impressive – poignant even. But they can also be used to generate fake media content, and are the technology underpinning Deepfakes.
Automatically apply RL to simulation use cases (e.g. call centers, warehousing, etc.) using Pathmind.Get Started
In a surreal turn, Christie’s sold a portrait for $432,000 that had been generated by a GAN, based on open-source code written by Robbie Barrat of Stanford. Like most true artists, he didn’t see any of the money, which instead went to the French company, Obvious.0
In 2019, DeepMind showed that variational autoencoders (VAEs) could outperform GANs on face generation.
Photo via Art and Artificial Intelligence Laboratory, Rutgers University
To understand GANs, you should know how generative algorithms work, and for that, contrasting them with discriminative algorithms is instructive. Discriminative algorithms try to classify input data; that is, given the features of an instance of data, they predict a label or category to which that data belongs.
For example, given all the words in an email (the data instance), a discriminative algorithm could predict whether the message is
spam is one of the labels, and the bag of words gathered from the email are the features that constitute the input data. When this problem is expressed mathematically, the label is called
y and the features are called
x. The formulation
p(y|x) is used to mean “the probability of y given x”, which in this case would translate to “the probability that an email is spam given the words it contains.”
So discriminative algorithms map features to labels. They are concerned solely with that correlation. One way to think about generative algorithms is that they do the opposite. Instead of predicting a label given certain features, they attempt to predict features given a certain label.
The question a generative algorithm tries to answer is: Assuming this email is spam, how likely are these features? While discriminative models care about the relation between
x, generative models care about “how you get x.” They allow you to capture
p(x|y), the probability of
y, or the probability of features given a label or category. (That said, generative algorithms can also be used as classifiers. It just so happens that they can do more than categorize input data.)
Another way to think about it is to distinguish discriminative from generative like this:
One neural network, called the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity; i.e. the discriminator decides whether each instance of data that it reviews belongs to the actual training dataset or not.
Let’s say we’re trying to do something more banal than mimic the Mona Lisa. We’re going to generate hand-written numerals like those found in the MNIST dataset, which is taken from the real world. The goal of the discriminator, when shown an instance from the true MNIST dataset, is to recognize those that are authentic.
Meanwhile, the generator is creating new, synthetic images that it passes to the discriminator. It does so in the hopes that they, too, will be deemed authentic, even though they are fake. The goal of the generator is to generate passable hand-written digits: to lie without being caught. The goal of the discriminator is to identify images coming from the generator as fake.
Here are the steps a GAN takes:
So you have a double feedback loop:
You can think of a GAN as the opposition of a counterfeiter and a cop in a game of cat and mouse, where the counterfeiter is learning to pass false notes, and the cop is learning to detect them. Both are dynamic; i.e. the cop is in training, too (to extend the analogy, maybe the central bank is flagging bills that slipped through), and each side comes to learn the other’s methods in a constant escalation.
For MNIST, the discriminator network is a standard convolutional network that can categorize the images fed to it, a binomial classifier labeling images as real or fake. The generator is an inverse convolutional network, in a sense: While a standard convolutional classifier takes an image and downsamples it to produce a probability, the generator takes a vector of random noise and upsamples it to an image. The first throws away data through downsampling techniques like maxpooling, and the second generates new data.
Both nets are trying to optimize a different and opposing objective function, or loss function, in a zero-zum game. This is essentially an actor-critic model. As the discriminator changes its behavior, so does the generator, and vice versa. Their losses push against each other.
Image credit: Thalles Silva
If you want to learn more about generating images, Brandon Amos wrote a great post about interpreting images as samples from a probability distribution.
It may be useful to compare generative adversarial networks to other neural networks, such as autoencoders and variational autoencoders.
Automatically apply RL to simulation use cases (e.g. call centers, warehousing, etc.) using Pathmind.Get Started
Autoencoders encode input data as vectors. They create a hidden, or compressed, representation of the raw data. They are useful in dimensionality reduction; that is, the vector serving as a hidden representation compresses the raw data into a smaller number of salient dimensions. Autoencoders can be paired with a so-called decoder, which allows you to reconstruct input data based on its hidden representation, much as you would with a restricted Boltzmann machine.
Variational autoencoders are generative algorithm that add an additional constraint to encoding the input data, namely that the hidden representations are normalized. Variational autoencoders are capable of both compressing data like an autoencoder and synthesizing data like a GAN. However, while GANs generate data in fine, granular detail, images generated by VAEs tend to be more blurred.
You can bucket generative algorithms into one of three types:
When you train the discriminator, hold the generator values constant; and when you train the generator, hold the discriminator constant. Each should train against a static adversary. For example, this gives the generator a better read on the gradient it must learn by.
By the same token, pretraining the discriminator against MNIST before you start training the generator will establish a clearer gradient.
Each side of the GAN can overpower the other. If the discriminator is too good, it will return values so close to 0 or 1 that the generator will struggle to read the gradient. If the generator is too good, it will persistently exploit weaknesses in the discriminator that lead to false negatives. This may be mitigated by the nets’ respective learning rates. The two neural networks must have a similar “skill level.” 1
GANs take a long time to train. On a single GPU a GAN might take hours, and on a single CPU more than a day. While difficult to tune and therefore to use, GANs have stimulated a lot of interesting research and writing.
GANs are not the only generative models based on deep learning. The Microsoft-backed think tank OpenAI has released a series of powerful natural language generation models under the name GPT (Generative Pre-trained Transformer). In 2020, they released GPT-3 and made it accessible through an API. GPT-3 is a surprisingly powerful generative language model capable of emulating net new human speech in response to prompts.
Creating net new human speech is a core component of interactive, natural language dialog systems, but up until now, it has been very challenging to do well. Our new model, GPT-3, breaks new ground on this task, requiring only examples of the desired output to train a system capable of emulating. « This paragraph was writting by GPT-3 itself in response to the last sentence of the previous paragraph. It’s pretty good.
That said, it’s not perfect. Cracks appear in the human-like language it generates, as though it’s an alien occasionally stumbling through a phrase book.
Credit: Gary Larson
Here’s an example of a GAN coded in Keras:
class GAN(): def __init__(self): self.img_rows = 28 self.img_cols = 28 self.channels = 1 self.img_shape = (self.img_rows, self.img_cols, self.channels) optimizer = Adam(0.0002, 0.5) # Build and compile the discriminator self.discriminator = self.build_discriminator() self.discriminator.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) # Build and compile the generator self.generator = self.build_generator() self.generator.compile(loss='binary_crossentropy', optimizer=optimizer) # The generator takes noise as input and generated imgs z = Input(shape=(100,)) img = self.generator(z) # For the combined model we will only train the generator self.discriminator.trainable = False # The valid takes generated images as input and determines validity valid = self.discriminator(img) # The combined model (stacked generator and discriminator) takes # noise as input => generates images => determines validity self.combined = Model(z, valid) self.combined.compile(loss='binary_crossentropy', optimizer=optimizer) def build_generator(self): noise_shape = (100,) model = Sequential() model.add(Dense(256, input_shape=noise_shape)) model.add(LeakyReLU(alpha=0.2)) model.add(BatchNormalization(momentum=0.8)) model.add(Dense(512)) model.add(LeakyReLU(alpha=0.2)) model.add(BatchNormalization(momentum=0.8)) model.add(Dense(1024)) model.add(LeakyReLU(alpha=0.2)) model.add(BatchNormalization(momentum=0.8)) model.add(Dense(np.prod(self.img_shape), activation='tanh')) model.add(Reshape(self.img_shape)) model.summary() noise = Input(shape=noise_shape) img = model(noise) return Model(noise, img) def build_discriminator(self): img_shape = (self.img_rows, self.img_cols, self.channels) model = Sequential() model.add(Flatten(input_shape=img_shape)) model.add(Dense(512)) model.add(LeakyReLU(alpha=0.2)) model.add(Dense(256)) model.add(LeakyReLU(alpha=0.2)) model.add(Dense(1, activation='sigmoid')) model.summary() img = Input(shape=img_shape) validity = model(img) return Model(img, validity) def train(self, epochs, batch_size=128, save_interval=50): # Load the dataset (X_train, _), (_, _) = mnist.load_data() # Rescale -1 to 1 X_train = (X_train.astype(np.float32) - 127.5) / 127.5 X_train = np.expand_dims(X_train, axis=3) half_batch = int(batch_size / 2) for epoch in range(epochs): # --------------------- # Train Discriminator # --------------------- # Select a random half batch of images idx = np.random.randint(0, X_train.shape, half_batch) imgs = X_train[idx] noise = np.random.normal(0, 1, (half_batch, 100)) # Generate a half batch of new images gen_imgs = self.generator.predict(noise) # Train the discriminator d_loss_real = self.discriminator.train_on_batch(imgs, np.ones((half_batch, 1))) d_loss_fake = self.discriminator.train_on_batch(gen_imgs, np.zeros((half_batch, 1))) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) # --------------------- # Train Generator # --------------------- noise = np.random.normal(0, 1, (batch_size, 100)) # The generator wants the discriminator to label the generated samples # as valid (ones) valid_y = np.array( * batch_size) # Train the generator g_loss = self.combined.train_on_batch(noise, valid_y) # Plot the progress print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss, 100*d_loss, g_loss)) # If at save interval => save generated image samples if epoch % save_interval == 0: self.save_imgs(epoch) def save_imgs(self, epoch): r, c = 5, 5 noise = np.random.normal(0, 1, (r * c, 100)) gen_imgs = self.generator.predict(noise) # Rescale images 0 - 1 gen_imgs = 0.5 * gen_imgs + 0.5 fig, axs = plt.subplots(r, c) cnt = 0 for i in range(r): for j in range(c): axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray') axs[i,j].axis('off') cnt += 1 fig.savefig("gan/images/mnist_%d.png" % epoch) plt.close() if __name__ == '__main__': gan = GAN() gan.train(epochs=30000, batch_size=32, save_interval=200)
Credit: The New Yorker
0) Students of the history of the French technology sector should ponder why this is one of the few instances when the French have shown themselves more gifted at marketing technology than at making it. France, a country with strong math pedagogy yet surprisingly Luddite tendencies in wider society, tends to build tech better than they market it. Why didn’t Minitel take over the world? Why did Jean-Louis Gassée and countless others feel it was necessary to quit France for America or London?
1) It’s interesting to consider evolution in this light, with genetic mutation on the one hand, and natural selection on the other, acting as two opposing algorithms within a larger process. What we are witnessing during the Anthropocene is the victory of one half of the evolutionary algorithm over the other; i.e. the genetic mutations in one species, homo sapiens, have enabled the creation of tools so powerful that natural selection plays very little part in shaping us. Some might speculate that that imbalance is leading to a catastrophic collapse of the system, much as we see with poorly tuned GANs. To take it a step further, perhaps this is the structural flaw in the development of intelligent life, akin to a Great Filter, which explains why humans have not found signs of other advanced species in the universe, despite the mathematical probability that such life should arise in a universe so large.
Homo sapiens is evolving faster than other species we compete with for resources. Along those lines, we might entertain a definition of intelligence that is primarily about speed. All other things being equal, the more intelligent organism (or species or algorithm) solves the same problem in less time. What we see now in the field of AI is an acceleration of algorithms’ ability to solve an increasing number of problems, boosted by faster chips, parallel computation, and hundreds of millions in research funding. Elon Musk has expressed his concern about AI, but he has not expressed that concern simply enough, based on a clear analogy. Algorithms are learning faster than we are, just as we learn faster than the species we are driving to extinction. It’s about speed. Massively parallelized hardware is a way of parallelizing time. And that is something that the human brain can not yet benefit from.
Credit: Alena Harley on fast.ai Forums
[Generating Images with Perceptual Similarity Metrics based on Deep Networks] [Paper]
[Adversarial Training for Sketch Retrieval] [Paper]
[Generative Adversarial Networks as Variational Training of Energy Based Models] [Paper](ICLR 2017)
[Adversarial Feature Learning] [Paper]
[Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks] [Paper](ICLR)
[Semi-Supervised QA with Generative Domain-Adaptive Nets] [Paper](ACL 2017)
[Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks] [Paper]
[Globally and Locally Consistent Image Completion] [MainPAGE](SIGGRAPH 2017)
[Image super-resolution through deep learning ][Code](Just for face dataset)
[Semantic Segmentation using Adversarial Networks] [Paper]（Soumith’s paper）
[Perceptual generative adversarial networks for small object detection] [Paper](CVPR 2017)
MaskGAN: Better Text Generation via Filling in the __ Goodfellow et al
[MoCoGAN: Decomposing Motion and Content for Video Generation] [Paper]
[Unsupervised Image-to-Image Translation with Generative Adversarial Networks] [Paper]
[Unsupervised Image-to-Image Translation Networks] [Paper]
[Triangle Generative Adversarial Networks] [Paper]
[Mode Regularized Generative Adversarial Networks] [Paper](Yoshua Bengio , ICLR 2017)
[How to train Gans] [Docu]
[Towards Principled Methods for Training Generative Adversarial Networks] [Paper](ICLR 2017)
[Towards Principled Methods for Training Generative Adversarial Networks] [Paper]
[Generalization and Equilibrium in Generative Adversarial Nets] [Paper]（ICML 2017）
[Transformation-Grounded Image Generation Network for Novel 3D View Synthesis] [Web](CVPR 2017)
[Neural Face Editing with Intrinsic Image Disentangling] [Paper](CVPR 2017)
[Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis] [Paper](ICCV 2017)
[Maximum-Likelihood Augmented Discrete Generative Adversarial Networks] [Paper]
[Boundary-Seeking Generative Adversarial Networks] [Paper]
[GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution] [Paper]
[Generative OpenMax for Multi-Class Open Set Classification] [Paper](BMVC 2017)
[cleverhans] [Code](A library for benchmarking vulnerability to adversarial examples)
[reset-cppn-gan-tensorflow] [Code](Using Residual Generative Adversarial Networks and Variational Auto-encoder techniques to produce high-resolution images)
[HyperGAN] [Code](Open source GAN focused on scale and usability)