I'm working on a classifier for reddit posts, and I have the impression that non-text features such as subreddit, author, domain or votes are being drown by the sheer number of features from the text (link title, and optionnally comments and linked page).
So I'm thinking of using some sort of dimensionality reduction on the text features before handing them to the classifier. Am I on the right path?
EDIT: thanks everyone for the answers!
[link] [11 comments]