I want to start learning and implementing some Natural Language processing algorithms for sentiment analysis and document classification. What is the best way to store data like corpus, bag of words, lexicons, and chunking grammar?
I am using python and Julia. But I also know some java.
I am thinking of using HDF5 due to its fast read/write speed. My question is what do researchers use and what formats are efficient?
[link][6 comments]