Like autoencoders, but penalize latent variables for deviating from a Gaussian distribution.
can be calculated using our decoder neural network, which generates images by taking samples from the latent space .
is our prior over the latent space (usually a unit Gaussian).
is difficult to calculate because it requires calculating for all possible values of .
is the variational posterior, parameterized by a second set of parameters , which approximates the true posterior distribution .