pp62889051 2022-05-06 23:21 采纳率: 0%
浏览 426
已结题

R语言XGboost使用xgb.importance无法输出变量重要性

问题遇到的现象和发生背景

使用R语言进行某项数据的二元logistics分析,当数据使用xgb.train输出后完整输出了相应内容,然后使用xgb.importance输出变量重要性没有报错误,但是没有任何反应。

问题相关代码,请勿粘贴截图

二分类XGBoost

boston <- read.csv(file.choose())

skim(boston)

for(i in c(1,3, 4,21)){
boston[,i] <- factor(boston[,i])
}
#修正变量类型

set.seed(42)

trains<- createDataPartition(
y= boston$Length.of.hospital.stay,
p=0.85,
list=F,
)

trains2<-sample(trains,nrow(boston)*0.7)
valids <- setdiff(trains,trains2)

data_train <-boston[trains2,]
data_valid <-boston[valids,]
data_test <-boston[-trains,]

table(data_train$Length.of.hospital.stay)
table(data_valid$Length.of.hospital.stay)
table(data_test$Length.of.hospital.stay)

colnames(boston)
dvfunc<-dummyVars(~.,data=data_train[,1:19], fullRank = T)
data_trainx<- predict(dvfunc,newdata=data_train[,1:19])
data_trainy <- ifelse(data_train$Length.of.hospital.stay == "NO",0,1)

data_validx<-predict(dvfunc,newdata=data_valid[,1:19])
data_validy<- ifelse(data_valid$Length.of.hospital.stay == "NO",0,1)

data_testx<-predict(dvfunc,newdata=data_test[,1:19])
data_testy<- ifelse(data_test$Length.of.hospital.stay == "NO",0,1)

dtrain<-xgb.DMatrix(data=data_trainx,
label=data_trainy)
dvalid<-xgb.DMatrix(data=data_validx,
label=data_validy)
dtest<-xgb.DMatrix(data=data_testx,
label=data_testy)

watchlist<-list(train = dtrain, test = dvalid)

#训练模型

fit_xgb_reg <- xgb.train(
data=dtrain,
eta=0.3,
gamma=0.001,
max_depth =2,
subsample =0.7,
colsample_bytree =0.4,

objective = "binary:logistic",

nrounds = 1000,
watchlist=watchlist,
verbose=1,
print_every_n = 100,
early_stopping_rounds = 200
)

fit_xgb_reg
importance_matrix <- xgb.importance (model =fit_xgb_reg)
print(importance_matrix)
xgb.plot.importance(importance_matrix = importance_matrix)

运行结果及报错内容

运行到训练模型部分时,
[1] train-logloss:0.439797 test-logloss:0.439797
Multiple eval metrics are present. Will use test_logloss for early stopping.
Will train until test_logloss hasn't improved in 200 rounds.

[101] train-logloss:0.002339 test-logloss:0.002339
[201] train-logloss:0.002339 test-logloss:0.002339
Stopping. Best iteration:
[21] train-logloss:0.002339 test-logloss:0.002339
是我想要的内容。
运行importance_matrix <- xgb.importance (model =fit_xgb_reg)
Empty data.table (0 rows and 4 cols): Feature,Gain,Cover,Frequency
显示数据缺失

我想要达到的结果

运行代码
importance_matrix <- xgb.importance (model =fit_xgb_reg)
print(importance_matrix)
会显示数据
Feature Gain Cover Frequency
1: Total.blood.loss 0.167833001 0.064426668 0.06392694
2: HB.Decreased.value 0.156806719 0.031960656 0.03196347
3: ALB.Decreased.value 0.088609260 0.055453526 0.05479452
4: BMI 0.079037023 0.133290604 0.13242009
5: Total.blood.volume 0.064410934 0.113359875 0.11415525
6: cost 0.061274356 0.095499871 0.09589041
7: age 0.054263127 0.049857638 0.05022831
8: ALB.Preoperative 0.049471451 0.069283011 0.06849315
9: ALB.. 0.048777618 0.027424782 0.02739726
10: HB..preoperative. 0.043222795 0.072537008 0.07305936
11: weight 0.033926769 0.054270254 0.05479452
12: D.dimer 0.033483713 0.041747298 0.04109589
13: ALB.Postoperative 0.031290170 0.058621242 0.05936073
14: height 0.023252573 0.041057056 0.04109589
15: HB..Postoperative. 0.022152296 0.036533508 0.03652968
16: Prothrombin.time 0.019281749 0.022026106 0.02283105
17: ASA.2 0.012353804 0.009084074 0.00913242
18: surgery.site.1 0.005650289 0.009429195 0.00913242
19: gender.1 0.003094026 0.009577104 0.00913242
20: ASA.1 0.001808328 0.004560526 0.00456621

  • 写回答

4条回答 默认 最新

  • Kappuccinno 2022-05-06 23:51
    关注
    获得4.20元问题酬金

    那个是波士顿房价的数据吗?

    评论

报告相同问题?

问题事件

  • 系统已结题 5月14日
  • 修改了问题 5月7日
  • 修改了问题 5月6日
  • 赞助了问题酬金10元 5月6日
  • 展开全部