Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62908

Some questions on Text Analysis

$
0
0

Hi, I hope this is the right subreddit.

For a small project I'd like to sift through a large amount of articles and tag them according to category (Interview, News article etc.) and occurrence of a preset of notable items ( Names, Brands ).

Later on I might want to add some language processing ( figure out if the article is FROM, WITH or ABOUT an item ) or sentiment analysis ( is this article positive or negative in tone ).

I've googled around, and I'm torn between GATE, NTLK and Rapidminer. I also have a couple of questions:

  • I couldn't find an Open Source library or Suite written in C or C++. Why is that? I'd think that it is quite resource hogging and using Java (which seems to be preferred ) or any other interpreter language would unneccesarilly bog down performance.
  • I'm not sure if any of the three tools above are really suited for the job. Especially GATE and Rapidminer seem like a bit of overkill. Your thoughts on that? *What are some good books/tutorials that will help me with this project? I've already bookmarked this link which at the time of posting is on the top of this subreddit. It seems to be quite useful for my goal. Other than that I am lost as for example the NTLK examples use tags like NN, NP-BSJ which I guess are shorthand for some grammatical definition (Nominative Noun maybe?) but which don't really help understanding. Any recommendations?

If you got this far, thanks for reading.

submitted by DeusexConstantia
[link] [4 comments]

Viewing all articles
Browse latest Browse all 62908

Trending Articles