Principal Component Analysis
PCA is a method to reduce the dimensions of a Dataset while keeping most information. We do this by rotating the Dataset and projecting the points to one (or ) dimensions.
To preserve distances and therefore information, you need to find the dimensions with highest Variance while discarding the ones with lowest variance.
Applications
- image (data) compression
- facial recognition with eigenfaces
- anomaly detection (first k components show normal behaviour)
Limitation: Adidas Problem
How to calculate
is the input matrix with datapoints and features.
Optional but recommended:
-
standardize values (e.g. Z-score)
-
Calculate Eigenvalues and Eigenvectors of
- Covariance matrix
- where are the eigenvectors and are the eigenvalues on the diagonal in decreasing order
-
Alternatively perform a Singular Value Decomposition.
- unitary matrix
- matrix with singular values on the diagonal
- matrix with the singular vectors
Inserting this into the covariance matrix will get us: ” where the eigenvalues correspond to the squared singular values and the eigenvectors to the singular vectors.
Finally we can project the data with We then only want to keep the most important dimensions determined by the order of the .
Assess feature contributions
See Loading.