Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62644

Principal component analysis - confusion about the last step: multiplying the derived feature vector matrix with the original dataset

$
0
0

I was reading this nice article about doing a PCA step by step, and the last step confused me a little bit:

<br>

From page 16 (Step 5):
"This the final step in PCA, and is also the easiest. Once we have chosen the components (eigenvectors) that we wish to keep in our data and formed a feature vector, we simply take the transpose of the vector and multiply it on the left of the original data set, transposed.

FinalData = RowFeatureVector x RowData Adjust

"where 'RowFeatureVector' is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top, and 'RowDataAdjust' is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension."

My Question:####

Let's say that I have a 3-dimensional data set and I want to drop 1 dimension using PCA. Now, if I determined the eigenvector with the minimum eigenvalue, and drop it, does it matter if the rest of the eigenvectors are sorted by their eigenvalues?

When I understand the text correctly, I would approach it like this:
- sort eigenvectors from highest to lowest eigenvalue
- sort the dataset with the order I sorted the eigenvectors
- matrix-matrix multiplication between eigenvector matrix (where I dropped the vector with the minimum eigenvalue) and dataset (where I dropped the dimension that corresponds to the eigenvector that I dropped)

Hope anyone can give me a hint if I am on the right track here! Thank you!

submitted by rasbt
[link][11 comments]

Viewing all articles
Browse latest Browse all 62644

Trending Articles