读取CSV并插入数据库性能

我的任务是逐行读取csv文件并将其插入数据库。</ p>

csv文件包含约170万行。</ p>

我将python与sqlalchemy orm(合并功能)配合使用。
但是它花费了五个多小时。</ n p>

是由python的性能下降还是sqlalchemy或sqlalchemy引起的?</ p>

,或者如果我使用golang使其性能明显提高怎么办?(但是 我没有任何旅行的经验。此外,这项工作需要每月安排)</ p>

希望你们提出任何建议,谢谢!</ p>

更新 :数据库-mysql </ p>
</ div>

展开原文

原文

I have a mission to read a csv file line by line and insert them to database.

And the csv file contains about 1.7 million lines.

I use python with sqlalchemy orm(merge function) to do this. But it spend over five hours.

Is it caused by python slow performance or sqlalchemy or sqlalchemy?

or what if i use golang to do it to make a obvious better performance?(but i have no experience on go. Besides, this job need to be scheduled every month)

Hope you guy giving any suggestion, thanks!

Update: database - mysql

doufu6423
doufu6423 您需要LOADDATA[LOCAL]INFILE。我无法为您提供帮助,但是还有其他问题,您可以在网上搜索官方的MySQL或其他帮助。
4 年多之前 回复
duanlei2150
duanlei2150 好的,我已经更新了标签和内容。
4 年多之前 回复
dongsi4547
dongsi4547 什么数据库?请标记您的问题
4 年多之前 回复

2个回答



对于这样的任务,您不想要</ strong>逐行插入数据 </ strong > :)基本上,您有2种方法:</ p>


  1. 确保sqlalchemy不会一一运行查询。 使用BATCH INSERT </ code>查询(如何执行 </ li>
  2. 按所需方式对数据进行按摩,然后将其输出到一些临时CSV文件中,然后运行 LOAD DATA [LOCAL] INFILE < / code>,如上所述。 如果您不需要预处理数据,只需将CSV馈入数据库(我假设它是MySQL)</ li>
    </ ol>
    </ div>

展开原文

原文

For such a mission you don't want to insert data line by line :) Basically, you have 2 ways:

  1. Ensure that sqlalchemy does not run queries one by one. Use BATCH INSERT query (How to do a batch insert in MySQL) instead.
  2. Massage your data in a way you need, then output it into some temporary CSV file and then run LOAD DATA [LOCAL] INFILE as suggested above. If you don't need to preprocess you data, just feed the CSV to the database (I assume it's MySQL)

dps43378
dps43378 好的,我会尝试的!
4 年多之前 回复

Follow below three steps

  1. Save the CSV file with the name of table what you want to save it to.
  2. Execute below python script to create a table dynamically (Update CSV filename, db parameters)
  3. Execute "mysqlimport --ignore-lines=1 --fields-terminated-by=, --local -u dbuser -p db_name dbtable_name.csv"

PYTHON CODE:

import numpy as np
import pandas as pd
from mysql.connector import connect

csv_file = 'dbtable_name.csv'
df = pd.read_csv(csv_file)
table_name = csv_file.split('.')

query = "CREATE TABLE " + table_name[0] + "( 
" 
for count in np.arange(df.columns.values.size):
    query += df.columns.values[count]
    if df.dtypes[count] == 'int64':
        query += "\t\t int(11) NOT NULL"
    elif df.dtypes[count] == 'object':
        query += "\t\t varchar(64) NOT NULL"
    elif df.dtypes[count] == 'float64':
        query += "\t\t float(10,2) NOT NULL"


    if count == 0:
        query += " PRIMARY KEY"

    if count < df.columns.values.size - 1:
        query += ",
"

query += " );"
#print(query)

database = connect(host='localhost',  # your host
                     user='username', # username
                     passwd='password',     # password
                     db='dbname') #dbname
curs = database.cursor(dictionary=True)
curs.execute(query)
# print(query)
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐