不仅要校验格式的合法性,还要校验日期的合法性:
import pandas as pd
df = pd.DataFrame([
{'date': '2021-11-03', 'time': '18:08:27', 'order_type': 'Dinner'},
{'date': '2021-19-12', 'time': '10:11:49', 'order_type': 'Breakfast'},
{'date': '2021-fri-17', 'time': '17:38:01', 'order_type': 'Dinner'},
{'date': 'Fri Apr 2 00:00:00', 'time': '18:38:52', 'order_type': 'Dinner'},
{'date': '2021-01-08', 'time': '08:10:08', 'order_type': 'Breakfast'}
])
reg = re.compile('^((((19|20)\d{2})(0[13-9]|1[012])(0[1-9]|[12]\d|30))|(((19|20)\d{2})(0[13578]|1[02])31)|'
'(((19|20)\d{2})02(0[1-9]|1\d|2[0-8]))|((((19|20)([13579][26]|[2468][048]|0[48]))|'
'(2000))0229))$')
invalid_df = df[df['date'].apply(lambda x: reg.match(x.replace('-', '')) is None)]
原数据:
筛选出的非法数据: