Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62546

What defines the "size" of a dataset?

$
0
0

This might be a trivial question, but I was reading this:

http://blog.echen.me/2011/04/27/choosing-a-machine-learning-classifier/

I'm an undergraduate student working on a Kaggle classifying competition for a school project. Our dataset has 2500 items, each with 5000 identifying parameters.

My question is, what is used to define the size of a dataset? How much data is needed before a dataset is considered "large", for example?

Also, would you need to take into consideration the number of parameters that each item has, as well as possibly the statistical properties of each of those parameters? If you have 100 parameters, but 99 of them are either too spaced out or too close together (so they're effectively useless in classification), would you consider the dataset to be small?

Thanks a lot for your help!

submitted by clarle
[link] [4 comments]

Viewing all articles
Browse latest Browse all 62546

Trending Articles