Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 63398

Need help with sklearn's GradientBoostingRegressor

$
0
0

I encountered a weird behavior while trying to train sklearn's GradientBoostingRegressor and make prediction. I will bring an example to demonstrate the issue on a reduced dataset but issue remains on a larger dataset as well. I have the following 2 small datasets adapted from a big dataset. As you can see the target variable is identical for both cases but input variables are different though their values are close to each other

Column 1Column 2Column 3Column 4Column 5target
101869.2102119.9102138.0101958.3101903.712384900
101809.1102031.3102061.7101930.0101935.211930700
101978.0102208.9102209.8101970.0101878.612116700
101869.2102119.9102138.0101958.3101903.712301200
102125.5102283.4102194.0101884.8101806.010706100
102215.5102351.9102214.0101769.3101693.610116900
Column 1Column 2Column 3Column 4Column 5target
101876.0102109.8102127.6101937.0101868.412384900
101812.9102021.2102058.8101912.9101896.411930700
101982.5102198.0102195.4101940.2101842.512116700
101876.0102109.8102127.6101937.0101868.412301200
102111.3102254.8102182.8101832.7101719.710706100
102184.6102320.2102188.9101699.9101548.110116900

I have the following code:

re1 = ensemble.GradientBoostingRegressor(n_estimators=40,max_depth=None,random_state=1) re1.fit(X1,Y) pred1 = re1.predict(X1) re2 = ensemble.GradientBoostingRegressor(n_estimators=40,max_depth=None,random_state=3) re2.fit(X2,Y) pred2 = re2.predict(X2) 

where X1 is a pandas DataFrame corresponding to Column 1 through Column 5 on the 1st dataset X2 is a pandas DataFrame corresponding to Column 1 through Column 5 on the 2nd dataset Y represents the target column. The issue I am facing is that I cannot explain why pred1 is exactly the same as pred2?? As long as X1 and X2 are not the same pred1 and pred2 must also be different, musn't they? Help me to find my false assumption, please.

P.S. To help you build the dataframe I wrote this code:

d1 = {'0':[101869.2,102119.9,102138.0,101958.3,101903.7,12384900], '1':[101809.1,102031.3,102061.7,101930.0,101935.2,11930700], '2':[101978.0,102208.9,102209.8,101970.0,101878.6,12116700], '3':[101869.2,102119.9,102138.0,101958.3,101903.7,12301200], '4':[102125.5,102283.4,102194.0,101884.8,101806.0,10706100], '5':[102215.5,102351.9,102214.0,101769.3,101693.6,10116900]} data1 = pd.DataFrame(d1).T X1 = data1.ix[:,:4] Y = data1[5] d2 = {'0':[101876.0,102109.8,102127.6,101937.0,101868.4,12384900], '1':[101812.9,102021.2,102058.8,101912.9,101896.4,11930700], '2':[101982.5,102198.0,102195.4,101940.2,101842.5,12116700], '3':[101876.0,102109.8,102127.6,101937.0,101868.4,12301200], '4':[102111.3,102254.8,102182.8,101832.7,101719.7,10706100], '5':[102184.6,102320.2,102188.9,101699.9,101548.1,10116900]} data2 = pd.DataFrame(d2).T X2 = data2.ix[:,:4] Y = data2[5] 
submitted by davit_
[link][5 comments]

Viewing all articles
Browse latest Browse all 63398

Trending Articles