执行以下语句
df.info()
df.to_csv(file,index=False,sep=',',encoding='utf_8_sig')
df=pd.DataFrame(pd.read_csv(file,encoding='utf_8_sig'))
df.info()
得到了如下输出:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6723 entries, 0 to 6722
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 cmt_id 6723 non-null int64
1 info_id 6723 non-null int64
2 cmt_user_id 6723 non-null int64
3 publish_time 6723 non-null object
4 cmt_time 6723 non-null object
5 diff_time 6723 non-null float64
6 platform 6723 non-null object
7 post 6723 non-null object
8 cmt_content 6723 non-null object
9 post_view 4098 non-null object
10 post_like 6723 non-null object
11 post_dislike 6723 non-null object
12 post_cmt 6559 non-null object
13 post_repost 1965 non-null object
14 cmt_like 5630 non-null object
15 cmt_dislike 5630 non-null object
16 cmt_reply 4336 non-null object
17 cmt_repost 4460 non-null object
18 user_gender 3091 non-null object
19 user_score 4572 non-null object
20 user_post_star 4048 non-null object
21 user_reply 993 non-null object
22 user_post 4572 non-null object
23 user_friend 993 non-null object
dtypes: float64(1), int64(3), object(20)
memory usage: 1.2+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6723 entries, 0 to 6722
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 cmt_id 6723 non-null int64
1 info_id 6723 non-null int64
2 cmt_user_id 6723 non-null int64
3 publish_time 6723 non-null object
4 cmt_time 6723 non-null object
5 diff_time 6723 non-null float64
6 platform 6723 non-null object
7 post 6419 non-null object
8 cmt_content 6723 non-null object
9 post_view 4098 non-null float64
10 post_like 2081 non-null float64
11 post_dislike 1767 non-null float64
12 post_cmt 6544 non-null float64
13 post_repost 1446 non-null float64
14 cmt_like 0 non-null float64
15 cmt_dislike 0 non-null float64
16 cmt_reply 0 non-null float64
17 cmt_repost 124 non-null float64
18 user_gender 2622 non-null object
19 user_score 4065 non-null float64
20 user_post_star 3579 non-null float64
21 user_reply 519 non-null float64
22 user_post 4103 non-null float64
23 user_friend 148 non-null float64
dtypes: float64(15), int64(3), object(6)
memory usage: 1.2+ MB
可以看到有很多列都出现了数据丢失,问题应该出在to_csv函数,因为路径文件内就有丢失。(网上提到编码,但肯定不是编码的问题)
请问为什么会出现这样的情况?如何解决?