问题遇到的现象和发生背景
请问如下代码是同时进行了过采样和下采样吗?另外,具体sampling_strategy应该怎么去用呢?
原始数据的value_counts():
0 13634
1 6305
问题相关代码
#SMOTE上采样:原少数类*1.5=9458个
from imblearn.combine import SMOTEENN
smo = SMOTEENN(sampling_strategy={1: 9458 },random_state=24)
tra1_x1, tra1_y1 = smo.fit_resample(train1.drop(['Pred','Date'], axis=1), train1['Pred'])
#下采样:原少数类*0.5*3.5=11034个
from imblearn.combine import SMOTETomek
rus = SMOTETomek(sampling_strategy={0: 11034 },random_state=24)
tra1_x1, tra1_y1 = rus.fit_resample(train1.drop(['Pred','Date'], axis=1), train1['Pred'])
print(tra1_x1.shape)
print((tra1_y1==1).sum()/len(tra1_y1))
运行结果及报错内容
ValueError: With over-sampling methods, the number of samples in a class should be greater or equal to the original number of samples. Originally, there is 13634 samples and 11034 samples are asked.
谢谢大家!