代码放入自定义函数中报错:
原代码,目的是去除万恶'\u202a'
csvfilename = r'D:\R\R\data\taobao_data.csv'
csvfilename = csvfilename.strip('\u202a')#除去不可见字符
dta = pd.read_csv(csvfilename, encoding = 'gbk') #encoding不可去除
不想每次等报错后再来运行这三行代码,于是写下:
def easyread(a):
import pandas as pd #导入pandas
if '\u202a' in a: #检查有无万恶字符
a = a.strip('\u202a')
data = pd.read_csv(a, encoding = 'gbk')
return data
else:
data= pd.read_csv(a)
return data
data1=easyread(r'D:\R\R\data\taobao_data.csv')
报错:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 354: invalid start byte
为什么原代码运行后没有问题,加入自定义函数就出现了编码问题呢?如何改进我的代码?
按回答更新后,仍然报同样的错:
def easyread(a):
import pandas as pd
if '\u202a' in a:
a = a.strip('\u202a')
try:
data = pd.read_csv(a,encoding = 'gbk')
except UnicodeDecodeError:
data = pd.read_csv(a,encoding = 'utf-8')
except UnicodeDecodeError:
data = pd.read_csv(a,encoding = 'gb18030')
except UnicodeDecodeError:
data = pd.read_csv(a,encoding = 'ansi')
return data
else:
data= pd.read_csv(a)
return data
data1=easyread('D:\\R\\R\\data\\taobao_data.csv')