Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62811

What is the best suited algorithm for this feature set and this problem? [NLP][Classifiers]

$
0
0

I have the NLP part down, tags are going to be applied to a set of documents with free text in it. Free text is of variable length.

These tags contain unique identifiers for terms in a dictionary, these IDs I am turning into a feature set for a classifier to classify them into multiple categories (as in Doc1 can be in either 1 or N number of categories and there can be M different categories).

The average length of the document can be considered to be about 100 to 200 words, and the different keywords possible are in high hundreds of thousands. The content can be considered as keyword rich with more than >50% of words being mapped to a keyword in the NLP step.

What is the best approach to train something that can recommend labels and appropriate number of labels for a given document?

submitted by ptpatil
[link][10 comments]

Viewing all articles
Browse latest Browse all 62811

Trending Articles