Notes on PCA Theory

3 minute read

These are non-comprehensive notes on the theoretical machinery behind Principal Component Analysis (PCA). This post is essentially an abridged summary of Tim Roughgarden and Gregory Valiant’s lecture notes for Stanford CS1681,2.

Consider a dataset where that has been preprocessed such that each has been shifted by the sample mean . The resulting dataset has a new sample mean equal to the vector.

The goal of PCA is to find a set of orthonormal vectors that form a basis for a -dimensional subspace that minimizes the squared distances between the data and the data’s projection onto this subspace. This subspace also maximizes the variance of the data’s projection onto it. These orthonormal vectors are called the principal components.

For derivation purposes it is useful to consider finding only the first principal component. Solving for the remaining principal components follows naturally.

For notation simplicity, every time I mention a vector it is assumed to be a unit vector.

Objective Function

The objective function is:

Connection to Dot Product

Relationship between dot product and distance to projection1

Want to minimize . By the Pythagorean Theorum:

is a constant with respect to optimization of , so minimizing is equivalent to maximizing the squared dot product :

Note that this maximizes the variance of the projections onto .

Matrix Notation and Connection to

Assemble the ’s into a matrix:

If we take the inner product of with itself, we get the PCA objective function:

The matrix is the covariance matrix of and it is symmetric.

The new objective is:

Understanding

Diagonal matrices (ie, zero everywhere except the diagonal) expand space along the standard axes (ie, in the directions of the standard basis vectors).

How a diagonal matrix expands space2

I won’t prove it here, but an important note for the rest of the derivation is the fact that the solution to for a diagonal matrix is the standard basis vector corresponding to the dimension with the largest diagonal entry in .

Every symmetric matrix can be expressed as a diagonal sandwiched between an orthogonal matrix[source]. The lecture notes consider the case where is a rotation matrix, but note that orthogonal matrices can take on other forms such as reflections and permutations.

This means that symmetric matrices still expand space orthogonally, but in directions rotated from the standard basis.

How a symmetric matrix expands space2

Eigenvectors

The eigenvectors of a matrix are those vectors that simply get stretched (or scaled) by the linear transformation of the matrix. A vector is an eigenvector of matrix if the following equality holds:

Where is the eigenvalue associated with , which indicates how much is scaled (or stretched) by .

Now consider the eigenvectors and eigenvalues of . We know that stretches space orthogonally along some set of axes. These axes are the eigenvectors, and the eigenvalues tell you how much space is stretched in that direction.

As discussed previously, the solution is the unit vector pointing in the direction of maximum stretch. This is an eigenvector of .

Now consider the PCA objective and remember that if is a unit vector and an eigenvector of then and :

This says the first principal component is the eigenvector of with the largest eigenvalue. The remaining principal components can be found by taking the remaining eigenvectors sorted by their eigenvalues.

References

[1] Tim Roughgarden, Gregory Valiant, CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding and Using Principal Component Analysis (PCA)

[2] Tim Roughgarden, Gregory Valiant, CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works