kdd99数据集的onehot编码错误

1、想把序列里边的3种协议(第二位置)换成对应onehot编码
原始样本例子如:
0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.
2、代码如下,

def handleProtocol ( input ):
    protoclo_list = [ 'tcp', 'udp', 'icmp' ]
    if input[ 1 ] in protoclo_list:
        a=find_index ( input[ 1 ], protoclo_list )[ 0 ]#返回x在y数组中的序列号
        values = array ( protoclo_list )
        print ( values )
    # integer encode
        label_encoder = LabelEncoder ( )
        integer_encoded = label_encoder.fit_transform ( values )
        print ( integer_encoded )
    # binary encode
        n_sample = len ( integer_encoded )
        n_class = max ( integer_encoded ) + 1
        onehot_labels = np.zeros ( (n_sample, n_class) )  # 长度行,种类列的矩阵
        onehot_labels[ np.arange ( n_sample ), integer_encoded ] = 1  # 有label对应内容的值为1
        return onehot_labels[a]

求指点问题

1个回答

要想使用one_hot编码,建议直接调用pandas里面的get_dummies函数。比如你把数据放进dataframe里,然后直接
pd.get_dummies(columns='proto')就可以了

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问