问题遇到的现象和发生背景
RandomizedSearchCV随机搜索加交叉验证进行xgb回归训练,训练数据量900,0000条,重要参数cv = 5;
随机搜索轮次中 score (r方) 计算值溢出,verbose 打印的cv轮次score结果全部为1
问:
1.score(r方)的计算应该为RandomizedSearchCV函数内部计算步骤,有什么办法解决这个溢出问题?
2.随机搜索过程中score(r方)全部为1,对搜索的效果是否有影响?如果有,该如何避免?
问题相关代码,请勿粘贴截图
param_grid = {
'learning_rate': [0.01, 0.1,0.3,0.5,0.7],
'n_estimators': [50,100,300,500,1000,3000,5000],
'max_depth': [3, 5, 7, 9],
'gamma': [0, 1,10,50],
'subsample': [0.7, 0.8,1],
'colsample_bytree': [0.7,0.8, 1]
}
grid = RandomizedSearchCV(XGBRegressor(seed=27,eval_metric="mae"), param_grid,cv=5,verbose=5)
grid.fit(x_train, y_train)
运行结果及报错内容
F:\linear\venv\lib\site-packages\sklearn\metrics_regression.py:807: RuntimeWarning: overflow encountered in square
weight * (y_true - np.average(y_true, axis=0, weights=sample_weight)) ** 2
[CV 1/5] END colsample_bytree=0.8, gamma=0, learning_rate=0.5, max_depth=7, n_estimators=50, subsample=0.7;, score=1.000 total time= 1.3min
F:\linear\venv\lib\site-packages\sklearn\metrics_regression.py:807: RuntimeWarning: overflow encountered in square
weight * (y_true - np.average(y_true, axis=0, weights=sample_weight)) ** 2
[CV 2/5] END colsample_bytree=0.8, gamma=0, learning_rate=0.5, max_depth=7, n_estimators=50, subsample=0.7;, score=1.000 total time= 1.3min
F:\linear\venv\lib\site-packages\sklearn\metrics_regression.py:807: RuntimeWarning: overflow encountered in square
weight * (y_true - np.average(y_true, axis=0, weights=sample_weight)) ** 2
[CV 3/5] END colsample_bytree=0.8, gamma=0, learning_rate=0.5, max_depth=7, n_estimators=50, subsample=0.7;, score=1.000 total time= 1.3min
F:\linear\venv\lib\site-packages\sklearn\metrics_regression.py:807: RuntimeWarning: overflow encountered in square
weight * (y_true - np.average(y_true, axis=0, weights=sample_weight)) ** 2
[CV 4/5] END colsample_bytree=0.8, gamma=0, learning_rate=0.5, max_depth=7, n_estimators=50, subsample=0.7;, score=1.000 total time= 1.3min
F:\linear\venv\lib\site-packages\sklearn\metrics_regression.py:807: RuntimeWarning: overflow encountered in square
weight * (y_true - np.average(y_true, axis=0, weights=sample_weight)) ** 2
[CV 5/5] END colsample_bytree=0.8, gamma=0, learning_rate=0.5, max_depth=7, n_estimators=50, subsample=0.7;, score=1.000 total time= 1.3min
我的解答思路和尝试过的方法
减小数据量?
我想要达到的结果
计算出真实的r方值