Variational Autoencoder

Introduction

In a deep learning model known as a variational autoencoder (VAE), probabilistic modeling is combined with the strength of neural networks. They give the conventional autoencoder architecture a probabilistic interpretation, enabling the creation of fresh data samples and the learning of latent representations.

Deep learning is a group of machine learning methods that include developing deep neural networks, which are composed of several hidden layers. These networks may learn hierarchical data representations, incorporating ever-more-abstract and sophisticated properties.

VAEs[2]

The usage of neural networks for both the encoder and decoder components in VAEs brings the deep learning aspect into play. The encoder converts the input data into a map to the latent space distribution's parameters. The encoder, which is often a deep neural network, maps the input data to the mean and variance of the latent distribution while continuously reducing the dimensionality of the data.

The decoder, which is also a deep neural network, reconstructs the input data using a sample from the latent space distribution. In order to produce an output that is a reconstruction of the input, it maps the latent sample back to the data space.

Backpropagation and optimization techniques like stochastic gradient descent or its derivatives are utilized to train the neural networks employed in VAEs. VAEs optimize the reconstruction loss and the regularization term as two primary goals during training.

The reconstruction loss evaluates the discrepancy between the input data and the decoder's output, which motivates the model to develop precise data reconstruction skills. It is often calculated using a pixel-wise loss function, such as cross-entropy loss for categorical data or mean squared error (MSE) for picture data.

The Kullback-Leibler (KL) divergence between the latent space distribution and a predetermined prior distribution, often a multivariate Gaussian, is the regularization term. It makes it smooth and uniform by encouraging the latent space to adhere to the intended distribution. The creation of fresh data samples is made possible by the regularization term, which also assures that the latent space is sample-samplable.

Optimizing these two goals together while training a VAE entails minimizing their combined sum or some weighted mixture of them. In order to minimize the objectives, the parameters of the encoder and decoder networks are updated iteratively while employing gradient-based optimization methods.

Data generation, anomaly detection, dimensionality reduction, and latent space exploration are just a few of the deep learning applications that VAEs can be used for after training. In VAEs, deep neural networks and probabilistic modeling work together to create new data samples with desirable properties. This combination offers a strong framework for learning complicated representations.

Autoencoder Remainder

Autoencoders are a particular sort of neural network design that can learn compressed representations of incoming data without the requirement for labeled examples. They are made up of an encoder and a decoder that collaborate to recreate the input data. The encoder converts the input data into a lower-dimensional representation known as the latent space, and the decoder reconstructs the original input from the latent space representation.

VAE Basic Structure[1]

Autoencoders' main goal is to reduce the reconstruction error, which measures the discrepancy between the original input and the rebuilt output. Autoencoders can identify important features and trends in the data by learning to accurately encode and decode the input.

Dimensionality reduction is an application of autoencoders that is frequently used. Autoencoders may extract the most important details from high-dimensional data and store them in a lower-dimensional space by learning a compressed representation in the latent space. This is helpful for operations like feature extraction, denoising, and data visualization.

Anomaly detection can also be done with autoencoders. Autoencoders acquire the ability to reliably reproduce typical data samples during training.

The reconstruction error is typically higher when unusual or unique data are used. Autoencoders can identify inputs as potential anomalies by putting a threshold on the reconstruction error and flagging inputs that drastically depart from the learned patterns.

Autoencoders may also be used for generative modeling. Bypassing them through the decoder, fresh data samples can be produced by sampling from the latent space. This capacity makes it possible to create artificial data that closely resembles the original input distribution.

In order to enhance performance or broaden capabilities, different autoencoders, such as denoising autoencoders, sparse autoencoders, and variational autoencoders, include new restrictions or probabilistic interpretations.

Loss Function

The regularization term, which is commonly represented by the Kullback-Leibler (KL) divergence, and the reconstruction loss are the two primary parts of a Variational Autoencoder's (VAE) loss function.

Reconstruction Loss: The reconstruction loss calculates the discrepancy between the decoder's output and the original input data. It motivates the VAE to develop the ability to precisely reconstruct the input data. The type of data and the reconstruction loss to use are determined by each other. For continuous data, such as photos, mean squared error (MSE) loss is frequently utilized, but the cross-entropy loss is appropriate for discrete or binary data, such as text.

The Kullback-Leibler (KL) divergence between the learned distribution of the latent space and a predetermined prior distribution, often a multivariate Gaussian, is the regularization term in a VAE. The latent space distribution is made smooth and regular by the regularization term, which encourages it to resemble the ideal prior distribution. It facilitates latent space regularization and guards against overfitting.

These two variables are often added up or weighted together to form the total loss function of a VAE. The regularization term favors a well-behaved latent space distribution, whereas the reconstruction loss encourages faithful reconstruction of the input data. A hyperparameter, such as the regularization term's weight, is often used to regulate the balance between these two terms.

During training, the objective is to minimize the overall loss function using optimization techniques such as stochastic gradient descent (SGD) or its variations. In order for VAEs to learn to encode the input data into a meaningful latent space and produce new samples from the learned distribution, the reconstruction loss and the regularization term must be jointly optimized.

It is important to note that the implementation and other VAE variations, such as the -VAE or VAE with additional architectural modifications, may affect the specific shape of the loss function. To attain certain goals or enhance the model's performance, various modifications may include extra words or adjust the loss function.

VAEs Advantages

VAEs can create fresh data samples by selecting from the latent space distribution they have learned, which is known as generative modeling. They become useful for activities like text production, data enrichment, and image synthesis as a result.

VaEs are able to interpret latent space by learning a compressed latent space representation of the input data. In order to comprehend the underlying patterns and features in the data, this latent space can be investigated and played with.

Continuous Latent Space: VAEs typically learn a continuous latent space distribution, enabling smooth interpolation between various latent space points. It is possible to interpolate and morph generated samples in a meaningful way thanks to this characteristic.

VAEs provide the latent space a probabilistic meaning, making it possible to estimate uncertainty and deal with missing or imperfect data.

VAEs can be used to measure the reconstruction error and identify anomalies. Effective anomaly identification is made possible by the likelihood of higher reconstruction error for inputs that considerably depart from the learned patterns.

VAEs Drawbacks

1. Blurred Reconstructions: VAEs optimize the trade-off between precise reconstructions and smooth distribution of the latent space. When compared to other autoencoder variations like denoising autoencoders, this can lead to reconstructions that are a little bit blurrier or less true to the original input.

2. Limited Mode Coverage: VAEs could find it challenging to record every mode in the data distribution. The learned latent space distribution could be biased towards capturing the dominant modes while discarding the rarer modes as a result of the regularization element in the loss function.
VAEs can be difficult to train because of the complicated loss function that includes the reconstruction loss and KL divergence term. Hyperparameters must be tuned carefully in order to strike a compromise between the regularization term and reconstruction quality.

3. Fixed Prior Distribution: VAEs rely on the latent space having a predetermined prior distribution, typically a multivariate Gaussian. This assumption might not precisely reflect the actual underlying distribution of the data, which might reduce the flexibility of the model.

4. Lack of Disentangled Representations: VAEs do not ensure disentangled representations of the data variables, despite their goal of learning meaningful latent representations. Disentanglement, which can be difficult to achieve with VAEs alone, is the learning of separate and interpretable elements in the latent space.

Implementation

Dataset: MNIST

Platform: Colaboratory

Source code

# Import the necessary Libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Encoder architecture
latent_dim = 2

encoder_inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(encoder_inputs)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var], name="encoder")

# Decoder architecture
latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 64, activation="relu")(latent_inputs)
x = layers.Reshape((7, 7, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")

# VAE architecture
class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder

    def encode(self, x):
        z_mean, z_log_var = self.encoder(x)
        return z_mean, z_log_var

    def decode(self, z):
        return self.decoder(z)

    def reparameterize(self, z_mean, z_log_var):
        epsilon = tf.random.normal(shape=tf.shape(z_mean))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

    def call(self, x):
        z_mean, z_log_var = self.encode(x)
        z = self.reparameterize(z_mean, z_log_var)
        reconstructed = self.decode(z)
        kl_loss = -0.5 * tf.reduce_mean(
            z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1
        )
        self.add_loss(kl_loss)
        return reconstructed

# Create the VAE model
vae = VAE(encoder, decoder)

# Compile the model
vae.compile(optimizer=keras.optimizers.Adam())

# Load and preprocess the MNIST dataset
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_train = np.reshape(x_train, (-1, 28, 28, 1))
x_test = x_test.astype("float32") / 255.0
x_test = np.reshape(x_test, (-1, 28, 28, 1))

# Train the VAE
vae.fit(x_train, x_train, epochs=3, batch_size=128, validation_data=(x_test, x_test))

Obtained Output:

Epoch 1/3 469/469 [==============================] - 22s 44ms/step - loss: 1.4953e-05 - val_loss: 2.3338e-09

Epoch 2/3 469/469 [==============================] - 19s 42ms/step - loss: 2.3338e-09 - val_loss: 2.3338e-09

Epoch 3/3 469/469 [==============================] - 21s 45ms/step - loss: 2.3338e-09 - val_loss: 2.3338e-09

# Visulaization of the VAE result

import matplotlib.pyplot as plt

# Reconstruct images from the test set
reconstructed_images = vae.predict(x_test)

# Generate new samples
latent_samples = np.random.normal(size=(10, latent_dim))
generated_images = decoder.predict(latent_samples)

# Plot original images and their reconstructions
n = 10  # Number of images to display
plt.figure(figsize=(20, 4))
for i in range(n):
    # Original image
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28), cmap="gray")
    plt.title("Original")
    plt.axis("off")
    
    # Reconstructed image
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(reconstructed_images[i].reshape(28, 28), cmap="gray")
    plt.title("Reconstructed")
    plt.axis("off")
plt.show()

# Plot generated samples
plt.figure(figsize=(10, 1))
for i in range(10):
    ax = plt.subplot(1, 10, i + 1)
    plt.imshow(generated_images[i].reshape(28, 28), cmap="gray")
    plt.title("Generated")
    plt.axis("off")
plt.show()

Obtained Output:

Description

Using the MNIST dataset, the given code creates a variational autoencoder (VAE). Here is a quick breakdown of the essential elements and procedures:

Data loading: The MNIST dataset is loaded by the code, and the pixel values are then normalized to fall inside the [0, 1] range.

Encoders and decoders make up the VAE's model architecture. The encoder converts input images into a lower-dimensional latent space. The decoder reconstructs the original images using samples taken from the latent space.

Convolutional layers are followed by completely connected layers in the encoder's architecture. The input images are mapped to the latent distribution's mean and the logarithm of variance. The latent distribution is sampled using the reparameterization technique.

The architecture of the decoder: The decoder reconstructs the images using samples from the latent space. It consists of fully connected layers followed by convolutional layers that have been reversed. Image reconstruction is an outcome in the end.

The encoder and decoder are combined in the VAE model. The system takes input images, encodes them into the latent space, and then decodes them to provide reconstructed images.

Reconstruction loss and KL divergence loss are combined to create the VAE's loss function. The reconstruction loss calculates how closely the reconstructed images resemble the original ones. The KL divergence loss calculates the difference between a prior distribution (often a standard Gaussian distribution) and the learned latent distribution.

Sample Generation: The VAE can be used to create fresh samples after training by selecting samples from the latent space and decoding the samples into pictures.

A generative model called the VAE creates new samples from the learned distribution by learning to encode and decode images. It is frequently applied to applications like dimensionality reduction, anomaly detection, and image synthesis.

Key Points to Remember

The following are important things to keep in mind regarding Variational Autoencoders (VAEs):

1. VAEs use an encoder network to map input data to a latent space and a decoder network to reconstruct the data from the latent space. This architecture is known as encoder-decoder architecture. To improve the reconstruction and latent space qualities, the encoder and decoder are trained together

2. The fundamental structure of the data is captured by a low-dimensional latent space that VAEs learn. Usually having fewer dimensions than the input data, the latent space follows a learned distribution.

3. Reparameterization Trick: During training, VAEs draw samples from the latent distribution that they have learned. As a result, backpropagation may be used to train the model and smooth gradients can be achieved during optimization.

4. Variational Inference: To discover the latent space, VAEs employ variational inference. They aim to maximize the evidence lower bound (ELBO), which is a limit on the likelihood of the genuine data. The ELBO consists of a regularization term (KL divergence) and a reconstruction term that pushes the learned latent distribution to resemble a previous distribution.

5. Generative Model: By selecting samples from the latent space and decoding them, VAEs are generative models that can produce new data samples. VAEs can produce fresh data points with traits resembling the training data by sampling from the latent space.

6. Continuous Latent Space: VAEs often have continuous latent spaces, allowing for easy interpolation and data manifold exploration.

7. Restoration and Normalization Trade-off: VAEs seek to strike a balance between the regularization term and the reconstruction accuracy (how well the model reconstructs the input data). By changing the weights given to each term in the loss function, one may manage the trade-off between these two goals.

8. Applications: VAEs have been successfully used in several fields, including feature learning, representation learning, data compression, anomaly detection, and image production.

Keep in mind that while VAEs are an effective tool for generative modeling and unsupervised learning, they are not without drawbacks. They might have trouble, for instance, collecting intricate data distributions or creating precise, high-fidelity reconstructions. When utilizing VAEs, it's crucial to take the unique properties of your data and the task needs into account.

Conclusion

Variational Autoencoders (VAEs) are potent generative models that can create new samples and learn latent representations of the input. They use a training method that maximizes the evidence lower bound (ELBO) and an encoder-decoder architecture. VAEs excel at producing fresh samples, learning continuous latent spaces, and offering an understandable representation. They may, however, have trouble capturing complex data distributions and need careful planning and training. The discipline of generative modeling is being advanced by the active study of VAEs, which have applications in many different fields.

References

[1] https://en.wikipedia.org/wiki/Variational_autoencoder

[2] https://polychord.io/

Variational Autoencoder and its Implementation in Keras

Variational Autoencoder

Introduction

Autoencoder Remainder

Loss Function

VAEs Advantages

VAEs Drawbacks

Implementation

Description

Key Points to Remember

Conclusion

References

Swapna

You may like these posts

Post a Comment

Get new posts by email:

Difference Between PCA and Autoencoders with an example

Software Components in Deep Learning

Difference Between PCA and Autoencoders with an example

Difference Between PCA and Autoencoders with an example

Hot Posts

Search This Blog

Most Recent

Difference Between PCA and Autoencoders with an example

Types of Autoencoders in Deep Learning

Clustering with Deep Learning Models and its implementation in python

Autoencoder Architecture with Keras in Deep Learning

Transfer Learning in Deep Learning with Keras

Yagna Dakshina

Contact form