Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 63054

Question regarding model selection for linear discriminant analysis models

$
0
0

Hi,

I'm working on a classification project for a data mining class, and we have to test several classification models on a dataset. (250,000 observations, 2 groups, 30 independent variables)

I'm working on a linear discriminant analysis model using R-Studio and the MASS and klaR package.

The steps I took: 1) Loaded the data

2) Ran the lda function on Class against the 30 independent variables (noticed a warning that some of the independent variables are collinear)

steps 3 onwards is where I have issues understanding what I'm doing/ whether I think i'm doing it right.

3) I searched for a model selection technique to trim down the amount of independent variables for LDA. The function I thought which would've helped was "stepclass" from "klaR".

Description of stepclass function: "Forward/backward variable selection for classification using any specified classification function and selecting by estimated classification performance measure from ucpm."

This was the code i used

ldaselect <- stepclass(Label~DER_mass_MMC+DER_mass_transverse_met_lep+DER_mass_vis+ DER_pt_h+DER_deltaeta_jet_jet+DER_mass_jet_jet+DER_prodeta_jet_jet+ DER_deltar_tau_lep+DER_pt_tot+DER_sum_pt+DER_pt_ratio_lep_tau+ DER_met_phi_centrality+DER_lep_eta_centrality+PRI_tau_pt+ PRI_tau_eta+PRI_tau_phi+PRI_lep_pt+PRI_lep_eta+PRI_lep_phi+ PRI_met+PRI_met_phi+PRI_met_sumet+PRI_jet_num+PRI_jet_leading_pt+ PRI_jet_leading_eta+PRI_jet_leading_phi+PRI_jet_subleading_pt+ PRI_jet_subleading_eta+PRI_jet_subleading_phi+ PRI_jet_all_pt, data=cdata, criterion ="AS", method="lda")

I figured the criterion should be AS (ability to separate) since LDA seeks to find the variables which create the greatest separation between groups.

However, after running forward, backward, both direction selection, it only returns one variable, "PRI_tau_pt" as the only necessary independent variable.

I'm just skeptical of the result, and I was hoping if anyone here could give me some advice, or let me know if I've chosen the wrong steps (or if model selection is even necessary for LDA)

Regards, Marvin

submitted by 07crisma
[link][1 comment]

Viewing all articles
Browse latest Browse all 63054

Trending Articles