这是业务要求:
- Queries 表:
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| query_name | varchar |
| result | varchar |
| position | int |
| rating | int |
+-------------+---------+
此表可能有重复的行。
此表包含了一些从数据库中收集的查询信息。
“位置”(position)列的值为 1 到 500 。
“评分”(rating)列的值为 1 到 5 。评分小于 3 的查询被定义为质量很差的查询。
将查询结果的质量 quality 定义为:
各查询结果的评分与其位置之间比率的平均值。
将劣质查询百分比 poor_query_percentage 为:
评分小于 3 的查询结果占全部查询结果的百分比。
编写解决方案,找出每次的 query_name 、 quality 和 poor_query_percentage。
quality 和 poor_query_percentage 都应 四舍五入到小数点后两位 。
以 任意顺序 返回结果表。
结果格式如下所示:
示例 1:
输入:
```python
Queries table:
+------------+-------------------+----------+--------+
| query_name | result | position | rating |
+------------+-------------------+----------+--------+
| Dog | Golden Retriever | 1 | 5 |
| Dog | German Shepherd | 2 | 5 |
| Dog | Mule | 200 | 1 |
| Cat | Shirazi | 5 | 2 |
| Cat | Siamese | 3 | 3 |
| Cat | Sphynx | 7 | 4 |
+------------+-------------------+----------+--------+
输出:
+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog | 2.50 | 33.33 |
| Cat | 0.66 | 33.33 |
+------------+---------+-----------------------+
解释:
Dog 查询结果的质量为 ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50
Dog 查询结果的劣质查询百分比为 (1 / 3) * 100 = 33.33
Cat 查询结果的质量为 ((2 / 5) + (3 / 3) + (4 / 7)) / 3 = 0.66
Cat 查询结果的劣质查询百分比为 (1 / 3) * 100 = 33.33
这是我的代码:
```python
import pandas as pd
def queries_stats(queries: pd.DataFrame) -> pd.DataFrame:
# 为query_name创建类别类型,并按照出现顺序排序
query_name_categories = queries['query_name'].unique()
queries['query_name'] = queries['query_name'].astype(pd.api.types.CategoricalDtype(categories=query_name_categories, ordered=True))
# 计算每个查询的质量
queries['quality'] = queries['rating'] / queries['position']
# 计算坏指标
poor_queries = queries[queries['rating'] < 3]
queries_counts = queries['query_name'].value_counts()
poor_queries_counts = poor_queries['query_name'].value_counts()
poor_query_percentage = (poor_queries_counts / queries_counts).fillna(0) * 100
# 计算平均质量并汇总
result = queries.groupby('query_name').agg({'quality': 'mean'}).round(2)
result['poor_query_percentage'] = poor_query_percentage.round(2)
# 重置索引
result = result.reset_index()
return result
这是其中的测试用例BUG:
'''
Queries =
| query_name | result | position | rating |
| ------------------------------ | -------------------------------------------------- | -------- | ------ |
| pdxafib | jwqjgyrnhaptivvnqkj | 16 | 5 |
| mpzkxkzbompbpbavkb | srlfyprermbgldweweulwdfmbddgarvtdm | 10 | 1 |
| udrcvuaxzjhbnvcyo | hekmecxifwqthvcixhtnxnhjfzwuzbgirrljrdiz | 14 | 3 |
| xdyoviczumlpchsnsbnwqmljgh | kvrydrjdithfoytworvasqlyjulaolldallexqljk | 3 | 4 |
| xdyoviczumlpchsnsbnwqmljgh | qkebkugmaikexeuxnaruqrb | 19 | 2 |
| snhrdtzetlihfenkdjyatadzhskomr | ejxukcbau | 19 | 4 |
| pdxafib | vcsfpqonshcrzowpcxptnjt | 18 | 3 |
| dcuzwcamcq | qjvzikdbxfhgtywyneozezwycafarobz | 9 | 1 |
...
输出
| query_name | quality | poor_query_percentage |
| ------------------------------ | ------- | --------------------- |
| alprumqp | 0.13 | 100 |
| boxmkivtcmaqgucfayepvdlrzkwstd | 0.16 | 66.67 |
| brlenjuadhbjaevowuwtfuoicgfey | 0.48 | 16.67 |
| cfcwoydsmbcqds | 1.42 | 25 |
| cfzjkoi | 0.33 | 50 |
| cjtxatphhnxwxkfjcojmvh | 0.13 | 50 |
| dcuzwcamcq | 0.27 | 40 |
| dfyiagitstfpzycnojhfegbfcsmr | 0.7 | 0 |
| drjghcpuulxohcihqoqwevt | 0.24 | 28.57 |
| duvrrsjdlqlmytfjjzulkar | 0.72 | 50 |
| ebjfgw | 0.3 | 42.86 |
| ejzvfzcid | 1.63 | 0 |
| ekibmwgcor | 0.22 | 25 ...
预期结果
| query_name | quality | poor_query_percentage |
| ------------------------------ | ------- | --------------------- |
| pdxafib | 0.34 | 0 |
| mpzkxkzbompbpbavkb | 0.08 | 100 |
| udrcvuaxzjhbnvcyo | 0.41 | 33.33 |
| xdyoviczumlpchsnsbnwqmljgh | 0.46 | 25 |
| snhrdtzetlihfenkdjyatadzhskomr | 0.39 | 16.67 |
| dcuzwcamcq | 0.27 | 40 |
| wrtmvtnx | 0.31 | 50 |
| jgmsykrgxja | 0.22 | 20 |
| ebjfgw | 0.3 | 42.86 |
| tlbaycikeiljhfvpwjzhdo | 0.57 | 50 |
| yitrgpgc | 0.18 | 50 |
| drjghcpuulxohcihqoqwevt | 0.24 | 28.57 |
| ekibmwgcor | 0.22 | 25 ...
'''
我的代码好像进行了自动排序,不符合业务要求,谢谢各位帮忙