Introduction to autoencoders (Jeremy Jordan)

April 28, 2026

-unsupervised learning technique used for representation learning

-impose a bottleneck in the network which forces "compressed" knowledge representation of inputs

-learn structure of the inputs to form latent representations

-framed as supervised learning tasked with outputting

\hat{x}

(original input reconstruction)

-minimize reconstruction error (reconstructed input - original input)

\mathcal{L}(x, \hat{x})

-don't want "memory" of data to prevent overfitting, add regularization term

Sparse Autoencoders

-offer alternative to information bottleneck without reducing the # of hidden layer nodes

-construct loss functions that penalize activations (not weights) within a layer

-emergent capability - allow network to sensitize individual nodes to specific attributes of input data (compose a latent distribution)

Contractive Autoencoders

-for similar inputs, learned encodings should be very similar

-explicitly bake this into the network by requiring that the derivatives of hidden layer activations are small, so that small changes to the inputs should not produce large changes in the representation