环境:python3
现有一个三元组的txt文件,每个三元组占一行,实体和属性用tab键隔开;
如何抽取出每一行的第一个实体并将其写入一个txt,一个实体占一行。数据量比较大大概6500万条
我这么写的,是不是正则表达式的问题?
import datetime
import re
start_time = datetime.datetime.now()
print("start time:", start_time)
count = 1
f = open(r'D:\bishe_data\test.txt',encoding='utf-8',mode='r')
line = f.readline()
while line != "":
s = re.split('^[^\s]+/t'',' ',data)
print(s)
line = f.readline()
f.close()
end_time = datetime.datetime.now()
print("end_time:", end_time)
print("during:", end_time - start_time)
print(count)
求大神解答!!