有这样的三个数据集
1.uuid 用户信息
uuid.head()
Out[29]:
Unnamed: 0 user_id name review_count \
0 7 f4_MRNHvN-yRn7EA8YWRxg Jennifer 822
1 35 dIIKEfOgo0KqUfGQvGikPg Gabi 2061
2 44 o-GAkbTTcHFY3KqVDCE8sA Tom 145
3 48 Jtq_pKd7GVbXvFY8YU8kmw Marissa 396
4 74 fgwI3rYHOv1ipfVfCSx7pg Emi 1847
yelping_since useful funny cool \
0 2011-01-17 00:18:23 4127 2446 2878
1 2007-08-10 19:01:51 20024 9684 16904
2 2009-12-31 00:51:15 209 49 55
3 2007-07-25 22:13:51 1255 340 419
4 2009-09-17 09:11:14 42491 31057 37222
elite \
0 2011,2012,2013,2014,2015,2016,2017,2018
1 2007,2008,2009,2010,2011,2012,2013,2014,2015,2...
2 2012,2013,2014,2015,2016
3 2007,2008,2009,2010,2011,2013,2014,2015,2016,2...
4 2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
friends fans average_stars \
0 c-Dja5bexzEWBufNsHfRrQ, 02HJNyOzzYXvEKVApJb8GQ... 137 3.63
1 6Y-l3x4LpUNhTBVMTFmTmA, HYNhRw_-8g660mpnwY2VJA... 971 4.07
2 oGBR461l8FR30W9MIXLuxw, EckWjCrPWekulzJB0YwVaQ... 8 4.18
3 W0uBNn91xWRIge_gJDM8hg, UsXqCXRZwSCSw0AT7y1uBg... 29 3.93
4 GGwLH-Vp6nKLeJ9iOzOd-A, gjhzKWsqCIrpEd9pevbKZw... 2113 4.31
compliment_hot compliment_more compliment_profile compliment_cute \
0 483 81 62 35
1 1587 85 94 231
2 0 0 1 0
3 48 9 1 4
4 4092 209 177 226
compliment_list compliment_note compliment_plain compliment_cool \
0 24 193 541 623
1 96 1171 3272 2169
2 0 2 3 2
3 8 36 59 68
4 89 1705 10653 5956
compliment_funny compliment_writer compliment_photos
0 623 293 172
1 2169 463 281
2 2 1 1
3 68 31 6
4 5956 1295 1470
2. rvdf 评论信息
rvdf.head()
Out[30]:
Unnamed: 0 review_id user_id \
0 0 y_4F7sR-yTF30O-hK5odKQ VfHVqfE3kWu1uhR6DYQY9A
1 1 55f0M0QUyZWx8NeEJ4MSVg 4Ngla54QXt6oHJsKmdVoSQ
2 2 K9aRAdMDmADeDRQEx3Xwvw V3t6VJNcO7yXslIJHG7nyA
3 3 Ziz7G4lNnji4hEWn5ENz3A yyH5S9mMOADRehpTzqlO1g
4 4 qM4Y36t52OLCN-MTBKfC6Q nCuv2BqIYecLYyJDD7OkVQ
business_id stars useful funny cool \
0 qSa3L7cy8Gju4ZLs50ycag 5.0 0 0 0
1 YqPcKy78Xog3JMy970D9bQ 5.0 1 0 0
2 WLX-1Tb0BK9u3374SWxR0w 5.0 0 0 0
3 _OoQ31fIoy3dK96rP6vtFQ 2.0 2 0 0
4 IbG2bgOTAGayL6T8VVePGQ 5.0 0 0 0
text date
0 The food and service are excellent my wife and... 2019-04-14 20:41:56
1 Sarah was so friendly and welcoming when I cam... 2019-02-10 22:39:29
2 This place is one of the best Thai food restau... 2019-04-27 22:27:14
3 We figured we would try this Moeno's since we'... 2018-12-12 16:37:03
4 Oooooohhhhhhh myyyyy!!!!! Delicious and speedi... 2019-03-27 19:53:42
3. buid 商户信息
buid.head()
Out[31]:
Unnamed: 0 business_id name \
0 20941 Ejg7UJQ_ACPveIi1QVPn_A Black Diamond Custom Tattoos
1 20942 ah8FbjdWiB6LO4a5r9CPxQ NTB - National Tire & Battery
2 20943 N1cVo_4pw2iMyi-P5Ih48A Veer Towers
3 20944 F4A2NCP7wq8zQTbwMaOB4w A&T Tire & Wheel
4 20945 x2Y_MD5pCTRyaf9dD_ZpGw Teriyaki Madness
address city state postal_code latitude \
0 4444 W Craig Rd, Ste 114 Las Vegas NV 89032 36.239442
1 5930 Mayfield Rd Mayfield Heights OH 44124 41.519393
2 3726 Las Vegas Blvd S Las Vegas NV 89158 36.107406
3 54 Industrial Parkway S Aurora ON L4G 3V6 43.998870
4 4503 Paradise Rd, Ste 320 Las Vegas NV 89169 36.107485
longitude stars review_count is_open \
0 -115.202142 4.5 9 1
1 -81.465557 2.5 12 1
2 -115.174673 4.0 26 1
3 -79.458871 3.0 5 1
4 -115.153106 3.0 56 1
attributes \
0 {'BusinessAcceptsCreditCards': 'True', 'GoodFo...
1 {'BusinessAcceptsCreditCards': 'True', 'BikePa...
2 {'BusinessAcceptsCreditCards': 'False', 'ByApp...
3 NaN
4 {'Alcohol': "'none'", 'Caters': 'False', 'Rest...
categories \
0 Art Galleries, Arts & Entertainment, Tattoo, B...
1 Automotive, Auto Repair, Tires, Shopping, Oil ...
2 Home Services, Real Estate Services, Apartment...
3 Automotive, Tires, Local Services, Recycling C...
4 Japanese, Hawaiian, Restaurants, Asian Fusion
hours
0 {'Monday': '0:0-0:0', 'Thursday': '15:0-21:0',...
1 {'Monday': '7:30-20:0', 'Tuesday': '7:30-20:0'...
2 NaN
3 {'Monday': '8:0-18:0', 'Tuesday': '8:0-18:0', ...
4 {'Monday': '10:0-23:0', 'Tuesday': '10:0-23:0'...
用户信息里包含有用信息为用户id‘user_id’,商户信息里的有用信息为商户id‘business_id’和商户地址‘state’和‘city’。评论信息里的有用信息为用户id‘user_id和id‘business_id’。
以上所有列表的信息量都超过了50万行。
目前有个问题,用户信息里没有地址这一项,就是没有“state”或者“city”这一项,于是我想通过商户的地址来判断用户的活跃地址,生成一列‘rvdf[state]’和‘rvdf[city]’,并且最终映射到用户信息里,形成用户的地址‘uuid[state]’和'‘uuid[city]’。
目前可知的是有差不多10%的商户没有地址,所以最终可能返回到用户是nan,然后有不足10%的用户没有评论,也就是无法映射出地址,最后也会返回成nan。但是整体对应还是很不错的。
求这个映射方法的Python写法。