A Question on Number of Classes and Data Distribution

So in a project I'm currently working on, I am trying to classify statistics from bump test process data to determine if a section of the process data is a good candidate to fit a model to (don't worry about that, I'm classifying stuff is all)

I've been manually looking at model fits and classifying them based upon my knowledge, and I originally had 3 classes (Good, Unclear, Poor) of fits. But I noticed while doing this that there are really more like 5 classes (Excellent/Perfect, Good, Unclear, Poor, and ohmygodgetitawayfromme/Atrocious).

Now ideally I would like to use the 5 classes, but I'm a bit worried that I will be splitting my data up unevenly, and I don't know what effects that will have on my classifier (see below). Is scaling the importance of classes a viable option in this case? Just as an estimate, I have about ~2000 data instances, and probably only ~50-100 will fall in the Excellent class and ~100-200 in Atrocious, where the others will be more evenly distributed. I can generate more data quite easily, but I am working alone, and manually classifying takes quite some time.

I am using a random forest right now, but will probably switch to a neural net later, as we probably can't release our source code, as dictated by the gnu public license that the random forest code I found uses (and I don't know nearly enough about it to write my own version).

Is this something I need to worry about?

submitted by arghdos
[link] [1 comment]

A Question on Number of Classes and Data Distribution

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

99 God Status for Whatsapp, Facebook

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Notorious Naushad of Ippa gang nabbed

Download: Stuf G ft B1 & Trice – Puzya Mami (Prod-j Stunner)

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Mp3 Download: Mr Raw - Hallelujah Ft. J Martins

Practice Sheet of Right form of verbs for HSC Students

Gauhati University TDC 2nd-4th-6th Result 2017 BA B.Com B.Sc

Universal Multi-Patch v1.3 By RADIXX11

[E² Plugin] HDF-Radio

Telangana TS New Food Security Card/ Telangana Ration card Application Form...

IWAN – Thanks and Praise ( Throw Back Thursday )

Inthalo ennenni vinthalo ( male ) lyrics and translation | Karthikeya (2014)