Image from Source
In this tutorial, we will look into one of the amazing applications of GANs by generating unique architectures. For doing this, you need no architectural skills or practice, but the knowledge about neural networks and how to train them. We will be going through the code and workflow for generating unique architectures using GANs as described in this GitHub project.
Generative Adversarial Networks (GANs) were introduced in 2014 by Goodfellow et al. which was considered as a breakthrough in the field of Neural networks and Generative models. The network today has grown and adapted into various forms along with state-of-the-art applications in various areas of the image, videos, and so on.
GANs are a comparatively new class of neural networks compared to other neural networks commonly used in various Computer Vision tasks. Moreover, Yann LeCun, one of the prominent researchers in the Deep Learning field, described GANs as “the most interesting idea in the last ten years in Machine Learning”. Following the first release of the idea of GANs, several independent researchers have developed their versions of the network to perform extraordinary tasks from editing photos to DeepFakes.
In this tutorial, we will walk through the process of using GANs to generate unique architectures. They can be used in designing new forms of urban architecture. However, if you wish to, you can use this tutorial to generate almost any image you want based on your preference.
- Adversarial: Involving two people or two sides who oppose each other.
- Computer Vision: An interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos.
- Programming knowledge in Python.
- Basic knowledge of Deep Learning, Tensorflow, and CNNs (Convolutional Neural Networks).
How can GANs be used to generate unique architectural designs?
GANs consist of 2 types of networks, Generative and Adversarial.
1. Generative Networks are a class of networks responsible for the generation. Hence, this network is also known as a generator.
2. Adversarial Networks are networks opposite of the Generative Network, i.e., they are responsible for classifying if the generated image is real or fake. This network is also known as a discriminator.
As we just discussed, the GAN has two networks competing against each other: a generator and a discriminator. A generator aims to generate new instances of an object based on a random noise sent to it as input, while the discriminator aims to determine whether the generated instance is real or fake by comparing the real and generated images.
Understanding it better
To understand it better, let’s suppose a customer is trying to use forged cash notes in a grocery store. Now it is up to the cashier to recognize if the cash is real or fake. If the cashier can recognize the forged cash, the customer is caught and might be even jailed. However, if the customer can replicate the cash note perfectly, there is less chance of being caught.
Here, consider the customer (generator) and the cashier (discriminator) to be competing against each other. In other words, the generator is trying to mimic the actual image in a way such that the discriminator should not be able to differentiate between the real and the fake ones. Over time, the discriminator gets better at detecting fake images. while the generator learns from its mistakes and gets better at generating more realistic images.
The generator and discriminator networks use convolutional neural networks (CNNs) to generate and predict the outcome of the network. Based on the architecture of the neural networks used, GANs are classified into various categories. In this tutorial, we will be using Deep Convolutional GANs (DCGANs) which uses Deep Convolutional Network architecture in its network.
Creating a DCGAN
DCGANs are essentially an improved version of a regular GAN. In this section, we will focus on the main elements of our model to generate unique architectures using the following,
- The generator (G) takes in a random noise vector (z) as input and generates an image.
- The generated image is fed into the discriminator (D), which compares the training set (real images) with our generated image.
- Based on its predictions, the discriminator outputs a number between 0 (fake) and 1 (real). Here, the generator has no idea of what the real image data looks like, and learns to adjust its output based on the feedback of the discriminator.
All the steps that we are going to discuss hereafter are available as a notebook here.
Step 1: Getting the data
For the training data, we are using the data from wikiart.org. Download the dataset here and save it into a folder named “data”. However, you can choose to use any data you prefer. We will then resize all our images into a 128×128 image for training.
Step 2: Inputs for the model
The first step is to create the input placeholders: inputs_real, i.e., the real image dataset for the discriminator and inputs_z which is the random noise vector for the generator.
Step 3: The model architecture – Generator
A generator takes in a random noise vector (z) as input and outputs a fake image. We are using a de-convolutional neural network, whose architecture is the opposite of a conventional convolutional neural network. The idea behind doing this is that, at every layer of the network, as we halve the filter size, the size of the image is doubled, which finally results in generated images.
As shown in the figure, we take in a random noise vector (z) of size 100 and pass it through a series of convolutional layers that finally outputs a 128×128 image. For the Leaky ReLu activation functions, we have used 0.3 and 0.2 as the alpha values.
Step 4: The model architecture – Discriminator
A discriminator takes in the real or generated image as an input and outputs a score based on its predictions. The network uses a CNN whose task is to classify the images from the training data set (real) and which come from the generator (fake).
- Inputs: Image with three color channels and 128×128 pixels in size.
- Outputs: Binary classification, to predict if the image is real (1) or fake (0).
As shown in the figure, the discriminator model comprises a feed-forward network that takes in the real images as input and produces a sigmoid probability between 0 and 1 in an attempt to evaluate the given instance of the generated image to be real or fake.
Note: For both the generator and discriminator networks, we are using the
tf.variable_scope so as to create the new variables for each and share and reuse the already created ones.
Step 5: Calculating discriminator and generator losses
The loss function of the DCGAN model contains two parts: the discriminator loss J(D) and the generator loss J(G).
Being an adversarial network, ideally the sum of these two loss functions should ultimately be zero ,i.e., J(G) = -J(D).
The discriminator loss in itself is the sum of the loss for real and fake images:
d_loss = d_loss_real + d_loss_fake
d_loss_real is the loss when the discriminator predicts an image is fake, when in fact it was a real image.
d_loss_fake is the loss when the discriminator predict an image is real, when in fact is was a fake image.
d_logits_fake from the discriminator is again fed to the generator loss function as the generator wants to learn how to fool the discriminator.
Step 6: Optimizing the model
After calculating the losses, we need to update the generator and discriminator separately.
To do this, we need to get the variables for each part by using
tf.trainable_variables() which creates a list of all the variables we’ve defined in our graph.
Step 7: Training the model
We will now be training the model with the hyperparameters such as epochs, batch size, latent vector dimensions, learning rate, exponential decay rate (beta1), etc.
Moreover, we are saving the model after every five epochs as well as the generated image in every ten batches of image training. Along with it, we are also calculating and displaying the
Step 8: Generating images
The generator is a feed-forward neural network that takes in random noise and gradually transforms it into images of a certain size during training.
In other words, it learns to map from a latent space to a particular data distribution of images by training, while the discriminator classifies the instances produced by the generator as real or fake.
Step 9: Setting the hyperparameters and running the model
Hyperparameters are essential in the learning process of the model. This is because it defines factors such as the duration, batch of images, learning rate fed to the model based on which the model learns to train and decrease its loss.
Step 10: Plotting the generated images
Finally, we have plotted the generated images.
Learning Tools and Strategies
- The key to learning about neural networks effectively is to learn and visualize the whole architecture of the system. By doing this, we can easily understand how the data is being processed step-by-step.
- Also, it is a good practice to print or log important messages and errors to help with debugging.
- Like most neural networks, DCGANs are quite sensitive to hyperparameters. Therefore, it’s very important to tune them precisely as they can largely affect the model’s performance.
This project was challenging as well as exhausting. Finding a proper image dataset for training took a lot of time initially. This was because the model would take a lot of time to generate satisfactory results. Additionally, I learned a lot more about GANs in general and also about the various architectures that can be easily modified according to the need of the problem, which makes them so versatile to use and create. Moreover, doing projects like these once in a while helps in discovering the undergoing functions behind complex architectures.
Conclusions and Future Directions
In conclusion, the results generated from the were quite comprehensive and to some extent have opened yet another opportunity for implementing GANs. Despite the generated images of the unique architectures aren’t of high-quality, these results prove that GANs can be quite helpful as a tool in the creative field. The following results were a result of training the model on a standard CPU for several hours. Upon training on a high-end GPU/TPU, the results can be expected to improve a lot.
- We used the dataset downloaded from WikiArt Library which contains a lot of artworks classified based on genre.
- References from PokeGAN
Also, the code for this project on Generating unique architectures using GANs is available on GitHub.
Finally, you might also be interested in this project on How to build an INR value predictor against 1 USD using Brain.js.