问题遇到的现象和发生背景
我需要利用股票数据做决策树分类器,我把A股市值作为y,其余特征项作为x,仿照鸢尾花数据集的案例构造决策树,交叉验证的时候我用df_X和df_y代替iris.data, iris.target,但试了很多次都有报错,这次是大片红色报错,请问有人知道该怎么改吗?
网上鸢尾花数据交叉验证的参考:
cross_val_score(clf, iris.data, iris.target, cv=10)
问题相关代码,请勿粘贴截图
from sklearn import tree
import pandas as pd
df=pd.read_excel('数据.xlsx') #读取数据
type(df)
'化连续为离散'
df=pd.read_excel('数据.xlsx')
df=df.dropna()
df=df.drop(0,axis = 0)
df=df.iloc[::-1]
df_diff=df.iloc[:,1:].diff(axis = 0)
df[df_diff >= 0] = 1 #变量上涨则赋值为1
df[df_diff < 0] = 0 #变量下跌则赋值为0
df=df.reset_index(drop = True) #重置索引
df=df.drop(0,axis = 0) #删除无法赋值的第0行
df=df.drop('指标名称',axis = 1) #删除日期列
df=df.reset_index(drop = True) #重置索引
df
'分离训练集与数据集'
train=df.head(211)
test=df.tail(8)
test=test.reset_index(drop = True)
X_train=train.drop('上证A股指数',axis = 1)
X_test=test.drop('上证A股指数',axis = 1)
y_train=train['上证A股指数']
y_test=test['上证A股指数']
'强制转换数据类型'
X_train=X_train.astype('int')
X_test=X_test.astype('int')
y_train=y_train.astype('int')
y_test=y_test.astype('int')
'构造决策树'
from sklearn import tree
clf = tree.DecisionTreeClassifier() #创建分类器对象
clf.fit(X_train,y_train) #用训练集拟合分类器模型
clf.predict(X_test) #用训练好的分类器预测数据的标签
'交叉验证'
from sklearn.model_selection import cross_val_score
from sklearn import tree
clf = tree.DecisionTreeClassifier() #创建分类器对象
df_X=df.drop('上证A股指数',axis = 1)
df_y=df['上证A股指数']
score = cross_val_score(clf,df_X,df_y,cv=10,scoring='accuracy')
运行结果及报错内容
D:\Anaconda\lib\site-packages\sklearn\model_selection_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
ValueError: Unknown label type: 'unknown'
FitFailedWarning)