Problem: Assume I have a 100 TB worth of web pages. How do I go about classifying them?
With little background in machine learning, what books/tutorials should I read to be able to accomplish this?
After I'm done with reading, are there any libraries out there that should help me with such endeavor?
[link][7 comments]