我靠,可算是肝出来了
import pandas as pd
df=pd.read_excel('./pandas_study.xlsx')
df1=df.head(10)
print(df1)
df = df.query('type in ["B","S"]')
#print(df)
#分组
df2 = df.groupby(['level', 'gender']).type.nunique().reset_index()
df2.columns=['level','gender','num']
print(df2)
print("********")
df2 = df2.set_index(['level', 'gender'])
print(df2)
print("********")
df = df.set_index(['level', 'gender'])
df['num']=pd.NA
df.update(df2)
df = df.query('num>=2&type in ["B","S"]').drop('num', axis=1).reset_index()
print('\n',df)
下面是我的xlsx
level gender math type
0 a man 123 B
1 b woman 188 B
2 a man 11 S
3 b man 23 B
4 b woman 23 R