u010001866
u010001866
2021-01-22 21:55

Python条件过滤数据

50
  • python
  • list
  • 数据挖掘

有这样的三个数据集

1.uuid 用户信息

uuid.head()
Out[29]: 
   Unnamed: 0                 user_id      name  review_count  \
0           7  f4_MRNHvN-yRn7EA8YWRxg  Jennifer           822   
1          35  dIIKEfOgo0KqUfGQvGikPg      Gabi          2061   
2          44  o-GAkbTTcHFY3KqVDCE8sA       Tom           145   
3          48  Jtq_pKd7GVbXvFY8YU8kmw   Marissa           396   
4          74  fgwI3rYHOv1ipfVfCSx7pg       Emi          1847   

         yelping_since  useful  funny   cool  \
0  2011-01-17 00:18:23    4127   2446   2878   
1  2007-08-10 19:01:51   20024   9684  16904   
2  2009-12-31 00:51:15     209     49     55   
3  2007-07-25 22:13:51    1255    340    419   
4  2009-09-17 09:11:14   42491  31057  37222   

                                               elite  \
0            2011,2012,2013,2014,2015,2016,2017,2018   
1  2007,2008,2009,2010,2011,2012,2013,2014,2015,2...   
2                           2012,2013,2014,2015,2016   
3  2007,2008,2009,2010,2011,2013,2014,2015,2016,2...   
4  2009,2010,2011,2012,2013,2014,2015,2016,2017,2018   

                                             friends  fans  average_stars  \
0  c-Dja5bexzEWBufNsHfRrQ, 02HJNyOzzYXvEKVApJb8GQ...   137           3.63   
1  6Y-l3x4LpUNhTBVMTFmTmA, HYNhRw_-8g660mpnwY2VJA...   971           4.07   
2  oGBR461l8FR30W9MIXLuxw, EckWjCrPWekulzJB0YwVaQ...     8           4.18   
3  W0uBNn91xWRIge_gJDM8hg, UsXqCXRZwSCSw0AT7y1uBg...    29           3.93   
4  GGwLH-Vp6nKLeJ9iOzOd-A, gjhzKWsqCIrpEd9pevbKZw...  2113           4.31   

   compliment_hot  compliment_more  compliment_profile  compliment_cute  \
0             483               81                  62               35   
1            1587               85                  94              231   
2               0                0                   1                0   
3              48                9                   1                4   
4            4092              209                 177              226   

   compliment_list  compliment_note  compliment_plain  compliment_cool  \
0               24              193               541              623   
1               96             1171              3272             2169   
2                0                2                 3                2   
3                8               36                59               68   
4               89             1705             10653             5956   

   compliment_funny  compliment_writer  compliment_photos  
0               623                293                172  
1              2169                463                281  
2                 2                  1                  1  
3                68                 31                  6  
4              5956               1295               1470  

2. rvdf 评论信息

rvdf.head()
Out[30]: 
   Unnamed: 0               review_id                 user_id  \
0           0  y_4F7sR-yTF30O-hK5odKQ  VfHVqfE3kWu1uhR6DYQY9A   
1           1  55f0M0QUyZWx8NeEJ4MSVg  4Ngla54QXt6oHJsKmdVoSQ   
2           2  K9aRAdMDmADeDRQEx3Xwvw  V3t6VJNcO7yXslIJHG7nyA   
3           3  Ziz7G4lNnji4hEWn5ENz3A  yyH5S9mMOADRehpTzqlO1g   
4           4  qM4Y36t52OLCN-MTBKfC6Q  nCuv2BqIYecLYyJDD7OkVQ   

              business_id  stars  useful  funny  cool  \
0  qSa3L7cy8Gju4ZLs50ycag    5.0       0      0     0   
1  YqPcKy78Xog3JMy970D9bQ    5.0       1      0     0   
2  WLX-1Tb0BK9u3374SWxR0w    5.0       0      0     0   
3  _OoQ31fIoy3dK96rP6vtFQ    2.0       2      0     0   
4  IbG2bgOTAGayL6T8VVePGQ    5.0       0      0     0   

                                                text                 date  
0  The food and service are excellent my wife and...  2019-04-14 20:41:56  
1  Sarah was so friendly and welcoming when I cam...  2019-02-10 22:39:29  
2  This place is one of the best Thai food restau...  2019-04-27 22:27:14  
3  We figured we would try this Moeno's since we'...  2018-12-12 16:37:03  
4  Oooooohhhhhhh myyyyy!!!!! Delicious and speedi...  2019-03-27 19:53:42 

3. buid 商户信息

buid.head()
Out[31]: 
   Unnamed: 0             business_id                           name  \
0       20941  Ejg7UJQ_ACPveIi1QVPn_A   Black Diamond Custom Tattoos   
1       20942  ah8FbjdWiB6LO4a5r9CPxQ  NTB - National Tire & Battery   
2       20943  N1cVo_4pw2iMyi-P5Ih48A                    Veer Towers   
3       20944  F4A2NCP7wq8zQTbwMaOB4w               A&T Tire & Wheel   
4       20945  x2Y_MD5pCTRyaf9dD_ZpGw               Teriyaki Madness   

                     address              city state postal_code   latitude  \
0   4444 W Craig Rd, Ste 114         Las Vegas    NV       89032  36.239442   
1           5930 Mayfield Rd  Mayfield Heights    OH       44124  41.519393   
2      3726 Las Vegas Blvd S         Las Vegas    NV       89158  36.107406   
3    54 Industrial Parkway S            Aurora    ON     L4G 3V6  43.998870   
4  4503 Paradise Rd, Ste 320         Las Vegas    NV       89169  36.107485   

    longitude  stars  review_count  is_open  \
0 -115.202142    4.5             9        1   
1  -81.465557    2.5            12        1   
2 -115.174673    4.0            26        1   
3  -79.458871    3.0             5        1   
4 -115.153106    3.0            56        1   

                                          attributes  \
0  {'BusinessAcceptsCreditCards': 'True', 'GoodFo...   
1  {'BusinessAcceptsCreditCards': 'True', 'BikePa...   
2  {'BusinessAcceptsCreditCards': 'False', 'ByApp...   
3                                                NaN   
4  {'Alcohol': "'none'", 'Caters': 'False', 'Rest...   

                                          categories  \
0  Art Galleries, Arts & Entertainment, Tattoo, B...   
1  Automotive, Auto Repair, Tires, Shopping, Oil ...   
2  Home Services, Real Estate Services, Apartment...   
3  Automotive, Tires, Local Services, Recycling C...   
4      Japanese, Hawaiian, Restaurants, Asian Fusion   

                                               hours  
0  {'Monday': '0:0-0:0', 'Thursday': '15:0-21:0',...  
1  {'Monday': '7:30-20:0', 'Tuesday': '7:30-20:0'...  
2                                                NaN  
3  {'Monday': '8:0-18:0', 'Tuesday': '8:0-18:0', ...  
4  {'Monday': '10:0-23:0', 'Tuesday': '10:0-23:0'... 

用户信息里包含有用信息为用户id‘user_id’,商户信息里的有用信息为商户id‘business_id’和商户地址‘state’和‘city’。评论信息里的有用信息为用户id‘user_id和id‘business_id’。

以上所有列表的信息量都超过了50万行。

目前有个问题,用户信息里没有地址这一项,就是没有“state”或者“city”这一项,于是我想通过商户的地址来判断用户的活跃地址,生成一列‘rvdf[state]’和‘rvdf[city]’,并且最终映射到用户信息里,形成用户的地址‘uuid[state]’和'‘uuid[city]’。

目前可知的是有差不多10%的商户没有地址,所以最终可能返回到用户是nan,然后有不足10%的用户没有评论,也就是无法映射出地址,最后也会返回成nan。但是整体对应还是很不错的。

求这个映射方法的Python写法。

  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

6条回答