I am searching for datasts in big-data learning to work on as my PhD thesis. I seek for big-learning datasets that have the following properties:
The problem is new and is hot for academic society. The corresponding dataset should be recent and publicly available.
Results of state-of-the-art methods are not satisfactory, due to high amount of data/computation. The main concern about the dataset should be its huge size, not the difficulty of the task itself. E.g., Naive-bayes has low accuracy and no one has been able to test SVM on it, because maybe no SVM package can handle that much data.
* More importantly, The dataset is preferably out of the computer-science community, i.e. few cs/ml researchers have tried to solve it. For example, NLP problems are too hard for contribution, because most NLP researchers are expert in ML & Algorithms, so it is extremely difficult for me to outperform their works! I think there should be easy-to-outperform datasets in bioinformatics, but I do not know which tasks are of the big-learning scheme. It would be grateful if you suggest datasets from other fields as well :)
[link][3 comments]