我想取dataframe 'category'列两个字母中,另外一列'data'的最大4个,并按照一定的规则排序,代码如下,请问有什么更简单的方法么?
import numpy as np
from time import time
import pandas as pd
df = pd.DataFrame()
n = 200
df['category'] = np.random.choice(('A', 'B'), n)
df['data'] = np.random.randint(1, 10000, len(df))
a = df[df['category'] == 'A'].sort_values(by='data', ascending=False).head(4)
b = df[df['category'] == 'B'].sort_values(by='data', ascending=False).head(4)
df = pd.concat([a, b]).sort_values(by=['category','data'],ascending=[True,False]).reset_index(drop=True)
print(df)
结果如下
category data
0 A 9889
1 A 9879
2 A 9873
3 A 9822
4 B 9909
5 B 9855
6 B 9775
7 B 9689