Taking into account that I have an exhaustive list of topics and I need to figure out the relevance of topics to millions of news articles, what all approaches can be experimented. At the end of the day, I should make topic pages based on these relevancy scores to display the News articles accordingly.
Since I am handling only News articles we can take advantage of the proximity of words because in most cases the headlines or the 1st paragraph of the article will cover the important topics.
I don't have any tagged data all I have is a set of news articles and a set of topics. A News article may be associated with one or more topics with some relevancy. I should figure out an algorithm to calculate such relevancy scores.
[link][2 comments]