我之前问过类似的问题,见链接: https://ask.csdn.net/questions/7409644
现在碰到更复杂的问题,我需要对dataframe按照类分类,每类按照该类data2的值最大的时候的data1值过滤该类,然后排序,请问该如何优化?
我把整数改成浮点数,更精确一点 ,代码如下:
import numpy as np
import pandas as pd
df = pd.DataFrame()
n = 200
df['category'] = np.random.choice(('A', 'B'), n)
df['data1'] = np.random.rand(len(df))*100
df['data2'] = np.random.rand(len(df))*100
a = df[df['category'] == 'A']
c = a[a['data2'] == a.data2.max()].data1.max()
a = a[a['data1'] <= c]
a = a.sort_values(by='data2', ascending=False).head(4)
b = df[df['category'] == 'B']
c = b[b['data2'] == b.data2.max()].data1.max()
b = b[b['data1'] <= c]
b = b.sort_values(by='data2', ascending=False).head(4)
df = pd.concat([a, b]).sort_values(by=['category', 'data1'], ascending=[True, False]).reset_index(drop=True)
print(df)
结果为:
category data1 data2
0 A 77.453241 98.628388
1 A 54.786469 97.470081
2 A 19.618200 96.261181
3 A 9.031004 97.067451
4 B 50.751809 99.219009
5 B 47.546003 96.488705
6 B 32.735357 98.565826
7 B 14.092039 95.359450