Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62750

Text classification - looking for approach suggestions

$
0
0

I'm trying classify short texts, using nltk and scikit-learn, but I am not sure yet how exactly approach it, and I am looking for advice. A particular text may belong to more then one class, or it may not belong to any. The dataset I have is about 100k items, with relatively small amount of items per category (thousands in few cases, hundreds in many, far less in most). For a given cIass I can easily generate samples of items that should be there, but I am not sure what about counter examples (if I need them). So far I am experimenting with naive Bayes classification, where I train classificator using known sample items and random selection of known items that don't belong to this class, doing this separately for each class. As a result classification works well for things that are good match, but generates lot of false positives. Is there a better way of doing this?

submitted by fiedzia
[link][4 comments]

Viewing all articles
Browse latest Browse all 62750

Trending Articles