Prandrou 2022-03-23 18:00 采纳率: 100%
浏览 26
已结题

循环计算时间差从第二个值变成NaT

问题遇到的现象和发生背景

循环计算时间差,但从第二个ID开始结果变成了NaT

问题相关代码,请勿粘贴截图

data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])
data['time']=data['InvoiceDate'].groupby(data['ID']).rank(ascending=1, method='dense')
data=data.sort_values(by=['ID','time'],ascending=(1,1))
print(data)
abc = pd.DataFrame()
originData= pd.DataFrame()
originData= pd.DataFrame()
CID = data['ID'].unique()
for i in CID:
res=data[data['ID']==i]
originData['Time1'] = res['InvoiceDate'] - res['InvoiceDate'].fillna(0).shift(1)
originData['ID'] = i
originData['time2'] = res['time']
abc = pd.concat([abc, originData], ignore_index=True)

print('结果为:\n',abc.head(50))

运行结果及报错内容
   Time1     ID  time2

0 NaT 12346 1.0
1 4 days 12346 2.0
2 17 days 12346 3.0
3 10 days 12346 4.0
4 8 days 12346 5.0
5 39 days 12346 6.0
6 118 days 12346 7.0
7 NaT 12347 NaN
8 NaT 12347 NaN
9 NaT 12347 NaN
10 NaT 12347 NaN
11 NaT 12347 NaN
12 NaT 12347 NaN
13 NaT 12347 NaN
14 NaT 12348 NaN
15 NaT 12348 NaN
16 NaT 12348 NaN
17 NaT 12348 NaN
18 NaT 12348 NaN
19 NaT 12348 NaN
20 NaT 12348 NaN
21 NaT 12349 NaN
22 NaT 12349 NaN
23 NaT 12349 NaN
24 NaT 12349 NaN
25 NaT 12349 NaN
26 NaT 12349 NaN
27 NaT 12349 NaN
28 NaT 12350 NaN
29 NaT 12350 NaN
30 NaT 12350 NaN
31 NaT 12350 NaN
32 NaT 12350 NaN
33 NaT 12350 NaN
34 NaT 12350 NaN
35 NaT 12351 NaN
36 NaT 12351 NaN
37 NaT 12351 NaN
38 NaT 12351 NaN
39 NaT 12351 NaN
40 NaT 12351 NaN
41 NaT 12351 NaN
42 NaT 12352 NaN
43 NaT 12352 NaN
44 NaT 12352 NaN
45 NaT 12352 NaN
46 NaT 12352 NaN

  • 写回答

2条回答 默认 最新

  • CSDN专家-HGJ 2022-03-23 18:26
    关注

    首先需要对读取的数据data进行预处理,另外代码中originData= pd.DataFrame()应该放到循环中才行,否则originData会在循环中不断增加导致合并时索引出错问题,出现了很多NaT和None。这样改即可:

    import pandas as pd
    import numpy as np
    
    data=pd.read_csv('sjcl.csv', index_col=[0], encoding='utf-8',low_memory=False).reset_index()
    data['ID']=data['ID'].astype(int)
    data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])
    date1=data.sort_values(by=['ID','InvoiceDate'],ascending=(1,1)).reset_index(drop=True)
    #print(date1.head(10))
    abc = pd.DataFrame()
    CID = data['ID'].unique().tolist()
    for i in CID:
        originData = pd.DataFrame()
        locData = date1[date1['ID'] == i]
        originData['Time'] =locData['InvoiceDate']-locData['InvoiceDate'].fillna(0).shift(1)
        originData['ID'] = locData['ID']
        abc = pd.concat([abc, originData], ignore_index=True)
    
    print('结果为:\n',abc.head(50))
    
    
    
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 4月1日
  • 已采纳回答 3月24日
  • 创建了问题 3月23日

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效