随机森林想做特征重要性排序,为什么输出全是0?求解
源码:
import pandas as pd
import numpy as np
df = pd.read_csv('chaobaihe-train.csv', header = None)
df.columns = ['SITE', 'year' ,'DO', 'KMnO4' ,'BOD5', 'NH3-N' ,'COD' ,'TN', 'TP', 'Cu' ,'Zn', 'F' ,'S']
print(df.head(5))
set(df['SITE' ])
print(df.shape)
df.isna().sum()
import numpy as np
np.unique(df['SITE'])
print(df.info())
df.describe()
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
x = df.iloc[:, 2:].values
y = df.iloc[:, 2:].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
feat_labels = df.columns[2:]
forest = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1,max_depth=3)
forest.fit(x_train.astype('int'), y_train.astype('int'))
score = forest.score(x_test.astype('int'), y_test.astype('int'))
forest.feature_importances_
importances = forest.feature_importances_
indices = np.argsort(importances)[::-1]
for f in range(x_train.shape[1]):
print("%2d) %-*s %f" %
(f + 1, 30, feat_labels[indices[f]], importances[indices[f]]))
得到结果如下:
SITE year DO KMnO4 BOD5 NH3-N COD TN TP Cu Zn F S
0 1 2016 0.628387 0.797494 0.968553 0.993504 0.839053 0.958378 0.984030 0.892423 0.817577 0.635799 0.939799
1 1 2017 0.651026 0.764411 0.943396 0.992423 0.823331 0.947680 0.982890 0.975712 0.895503 0.637449 0.904404
2 1 2018 0.582991 0.707268 0.923270 0.991816 0.834320 0.925946 0.985741 0.963803 0.964572 0.669967 0.975474
3 1 2019 0.544282 0.737343 0.959120 0.993606 0.810651 0.935045 0.992015 0.970391 0.988439 0.669190 0.974359
4 1 2020 0.645161 0.759398 0.930818 0.993350 0.771767 0.956847 0.994297 0.963803 0.990055 0.664337 0.977703
(87, 13)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87 entries, 0 to 86
Data columns (total 13 columns):
Column Non-Null Count Dtype
0 SITE 87 non-null int64
1 year 87 non-null int64
2 DO 87 non-null float64
3 KMnO4 87 non-null float64
4 BOD5 87 non-null float64
5 NH3-N 87 non-null float64
6 COD 87 non-null float64
7 TN 87 non-null float64
8 TP 87 non-null float64
9 Cu 87 non-null float64
10 Zn 87 non-null float64
11 F 87 non-null float64
12 S 87 non-null float64
dtypes: float64(11), int64(2)
memory usage: 9.0 KB
None
- S 0.000000
- F 0.000000
- Zn 0.000000
- Cu 0.000000
- TP 0.000000
- TN 0.000000
- COD 0.000000
- NH3-N 0.000000
- BOD5 0.000000
- KMnO4 0.000000
- DO 0.000000
修改forest.fit(x_train, y_train)
score = forest.score(x_test, y_test)
后也报错发生异常: ValueError
Unknown label type: 'continuous-multioutput'