现有一个矩阵(tsv文件),行名为ENSG开头+数字编号
前五行五列情况如图
想要读取output.json文件中如图的对应关系,把ENSG开头+数字编号替换为后面的代号内容
例如:行名为ENSG00000186092 替换为OR4F5
其中output.json中各个列使用\t分隔
输出名称替换后的矩阵(tsv文件)到指定路径
现有一个矩阵(tsv文件),行名为ENSG开头+数字编号
前五行五列情况如图
保留在json中不存在的数据:
import pandas as pd
json_data = pd.read_json("xxx.json", typ='series')
json_dict = json_data.to_dict()
df_chunk = pd.read_csv("xxx.tsv", sep='\t', chunksize=1000)
df_chunk_list = []
for chunk in df_chunk:
chunk['Ensembl_ID'] = chunk['Ensembl_ID'].apply(lambda x: x.split(".")[0])
for index, row in chunk.iterrows():
try:
chunk.loc[index, 'Ensembl_ID'] = json_dict[row['Ensembl_ID']]
except:
pass
df_chunk_list.append(chunk)
result_Df = pd.concat(df_chunk_list)
result_Df.to_csv('result.tsv', sep='\t', index=False)
不保留在json中不存在的数据:
import pandas as pd
json_data = pd.read_json("xxx..json", typ='series')
json_dict = json_data.to_dict()
df_chunk = pd.read_csv("xxx..tsv", sep='\t', chunksize=1000)
df_chunk_list = []
for i, chunk in enumerate(df_chunk):
chunk['Ensembl_ID'] = chunk['Ensembl_ID'].apply(lambda x: x.split(".")[0])
for index, row in chunk.iterrows():
try:
chunk.loc[index, 'Ensembl_ID'] = json_dict[row['Ensembl_ID']]
except:
chunk.drop(index=[index], inplace=True)
df_chunk_list.append(chunk)
result_Df = pd.concat(df_chunk_list)
result_Df.to_csv('result.tsv', sep='\t', index=False)