Principal Component Analysis

PCA is a method to reduce the dimensions of a Dataset while keeping most information. We do this by rotating the Dataset and projecting the points to one (or ) dimensions.

To preserve distances and therefore information, you need to find the dimensions with highest Variance while discarding the ones with lowest variance.

Limitation: Adidas Problem

How to calculate

is the input matrix with datapoints and features.

Optional but recommended:

  • standardize values (e.g. Z-score)

  • Calculate Eigenvalues and Eigenvectors of

    • Covariance matrix
    • where are the eigenvectors and are the eigenvalues on the diagonal in decreasing order
  • Alternatively perform a Singular Value Decomposition.

    • unitary matrix
    • matrix with singular values on the diagonal
    • matrix with the singular vectors

Inserting this into the covariance matrix will get us: ” where the eigenvalues correspond to the squared singular values and the eigenvectors to the singular vectors.

Finally we can project the data with We then only want to keep the most important dimensions determined by the order of the .

Assess feature contributions

See Loading.