Second post. I'm working on system that does some quick classification based on item content and characteristics, giving me a list of nearest neighbors and farthest neighbors using pearson squared distance.
I'd like to expand my little toy system and also have it take reviews online into account. Seems like a good fit for the basic bag-of-words approach I've learned about so far, but I'm having trouble figuring out how to combine this with the other features.
Should I look at including each word from the bag-of-words as a feature itself, and weighting them lightly compared to the content features, or should I think about calculating the bag-of-words distance separately and then have that distance as a feature?
Also, any advice on this? I tend to learn best starting from a practical side and then digging into the more expansive theory based stuff later, but I seem to be having a tough time with ML/Datascience doing that, but perhaps I'm looking the wrong place.
[link][11 comments]