我用了如下三种方法测量pandas列之间相加的效率:1.向量化 2.iterrows迭代 3.先转化成numpy再相加
按道理 向量化处理 肯定要比迭代快啊 为什么实际测下来会慢这么多呢?
df1 = pd.DataFrame({'aaa':{'0':0},
'bbb':{'0':8568}
,'ccc':{'0':1}
,'ddd':{'0':0}})
print(df1)
t = timeit.timeit(stmt="df1['aaa']+df1['bbb']+df1['ccc']+df1['ddd']", setup="from __main__ import df1",number=10000)
print(t)
t = timeit.timeit(stmt="for index,row in df1.iterrows():"
"row['aaa']+row['bbb']+row['ccc']+row['ddd']", setup="from __main__ import df1",number=10000)
print(t)
t = timeit.timeit(stmt="df1['aaa'].values+df1['bbb'].values+df1['ccc'].values+df1['ddd'].values", setup="from __main__ import df1",number=10000)
print(t)
运行结果
aaa bbb ccc ddd
0 0 8568 1 0
2.639297500019893
0.6697011000069324
0.0938532000000123