What is the best loss function to use when comparing discriminative predictive (probabilistic) models that apply to severely imbalanced data sets?

Hi,

I am try to identify the best predictive model that yields accurate probabilities of positives (the data's labels are binary) on a severely imbalanced data set.

I have 2 main problems.

Different loss functions tell me different models are better. In particular I get a 2-5% better ROC-AUC loss if I go with model 1 over model 2. However, I get a 7 to 20% better Log loss if I go with model 2 over model 1. The RMSE is 1% better for model 2. I realize log-loss and rmse are "well calibrated", however because the data is so imbalanced, I am concerned that these loss functions will be more impacted by noise in the data. Also, I do care to some degree about how well the predictions are ordered (another reason to favor AUC).
I am not sure I am doing Log and RMSE loss calculations correctly. Because the data is severely imbalanced (the ratio of positives to negatives is about 1 to 1000), the data i work with is pre-filtered before I get it to remove 99 out of every 100 negative training instances (and I can't get the removed instances later). During training time, I give all instances equal weight, but when I calculate model losses during cross validation, I give negative instances a compensatory weight of 100 and positive instances a weight of 1. My concern is that this weighting scheme is wrong for a number of reasons. For one I am concerned it will encourage overfitting to the few negative examples that didn't get removed by downsampling. Second, I worry that it only makes sense to evaluate a model on a test data set that uses the same weighting scheme as the training data.

Any thoughts and advice would be greatly Appreciated. Alex

submitted by AlexTHawk
[link][4 comments]

What is the best loss function to use when comparing discriminative predictive (probabilistic) models that apply to severely imbalanced data sets?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112