Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62874

New intern, a bit terrified by my dataset. OK to ask questions in this sub?

$
0
0

Alright, the situation is that I have an unpaid gig on a healthcare policy project with one hell of a (well, to me) dataset.

I'm dealing with about 70k rows and 300+ features. Some features are categorical, some continuous vars. Definite structure in the dataset as well.

I need to do some amount of exploratory analysis. My ultimate goal is to find a way to cluster the individuals in my dataset, then explore those clusters (WHY are those individuals alike?) and also identify outliers who don't fit well into ANY cluster.

I'd dearly love to pull out FactoMineR and go to town, but it's not built for a dataset of this size.

Any advice on packages/environments?

Or, am I in No Man's Land and I have to design my own solution?

submitted by Jonny5ive
[link][9 comments]

Viewing all articles
Browse latest Browse all 62874

Trending Articles