Principal Component Analysis

PCA is a method to reduce the dimensions of a Dataset while keeping most information. We do this by rotating the Dataset and projecting the points to one (or $n$ ) dimensions.

To preserve distances and therefore information, you need to find the dimensions with highest Variance while discarding the ones with lowest variance.

Applications

image (data) compression

facial recognition with eigenfaces

anomaly detection (first k components show normal behaviour)

Limitation: Adidas Problem

How to calculate

$X$ is the input matrix with $n$ datapoints and $p$ features.

Optional but recommended:

standardize values (e.g. Z-score)
Calculate Eigenvalues and Eigenvectors of $X$
- Covariance matrix $C = X X^{T}$
- $C = W L W^{T}$ where $W$ are the eigenvectors and $L$ are the eigenvalues on the diagonal in decreasing order
Alternatively perform a Singular Value Decomposition.
- unitary matrix $U$
- matrix $S$ with singular values on the diagonal
- matrix $W$ with the singular vectors
- $X = U S W^{T}$

Inserting this into the covariance matrix will get us: $W S^{2} W^{T}$ ” where the eigenvalues correspond to the squared singular values and the eigenvectors to the singular vectors.

Finally we can project the data with $T = X W .$ We then only want to keep the $q$ most important dimensions determined by the order of the $λ_{i}$ .

Assess feature contributions

See Loading.

Marcs Notes

Explorer

Principle Component Analysis

Principal Component Analysis

How to calculate

Assess feature contributions

Graphansicht

Inhaltsverzeichnis

Backlinks