dongyi2425 2018-10-21 21:45
浏览 638

在Laravel 5.7中插入/更新一百万行的最快方法

I'm using Laravel 5.7 to fetch large amounts of data (around 500k rows) from an API server and insert it into a table (call it Table A) quite frequently (at least every six hours, 24/7) - however, it's enough to insert only the changes the next time we insert (but at least 60-70% of the items will change). So this table will quickly have tens of millions of rows.

I came up with the idea to make a helper table (call it Table B) to store all the new data into it. Before inserting everything into Table A, I want to compare it to the previous data (with Laravel, PHP) from Table B - so I will only insert the records that need to be updated. Again it will usually be around 60-70% of the records.

My first question is if this above-mentioned way is the preferred way of doing it, in this situation (obviously I want to make it happen as fast as possible.) I assume that searching for an updating the records in the table would take a lot more time and it would keep the table busy / lock it. Is there a better way to achieve the same (meaning to update the records in the DB).


The second issue I'm facing is the slow insert times. Right now I'm using a local environment (16GB RAM, I7-6920HQ CPU) and MySQL is inserting the rows very slowly (about 30-40 records at a time). The size of one row is around 50 bytes.

I know it can be made a lot faster by fiddling around with InnoDB's settings. However, I'd also like to think that I can do something on Laravel's side to improve performance.

Right now my Laravel code looks like this (only inserting 1 record at a time):

foreach ($response as $key => $value)
{
    DB::table('table_a')
        ->insert(
        [
            'test1' => $value['test1'],
            'test2' => $value['test2'],
            'test3' => $value['test3'],
            'test4' => $value['test4'],
            'test5' => $value['test5'],
        ]);
}

$response is a type of array.

So my second question: is there any way to increase the inserting time of the records to something like 50k/second - both on the Laravel application layer (by doing batch inserts) and MySQL InnoDB level (changing the config).

Current InnoDB settings:

innodb_buffer_pool_size        = 256M
innodb_log_file_size           = 256M
innodb_thread_concurrency      = 16
innodb_flush_log_at_trx_commit = 2
innodb_flush_method            = normal
innodb_use_native_aio = true

MySQL version is 5.7.21.

If I forgot to tell/add anything, please let me know in a comment and I will do it quickly.

Edit 1: The server that I'm planning to use will have SSD on it - if that makes any difference. I assume MySQL inserts will still count as I/O.

  • 写回答

1条回答 默认 最新

  • douzou7012 2018-10-22 01:39
    关注

    Don't call insert() inside a foreach() because it will execute n number of queries to the database when you have n number of data.

    First create an array of data objects matching with the database column names. and then pass the created array to insert() function.

    This will only execute one query to the database regardless of how many number of data you have.

    This is way faster, way too faster.

    $data_to_insert = [];
    
    foreach ($response as $key => $value)
    {
        array_push($data_to_insert, [
                'test1' => $value['test1'],
                'test2' => $value['test2'],
                'test3' => $value['test3'],
                'test4' => $value['test4'],
                'test5' => $value['test5'],
        ]);
    }
    
    DB::table('table_a')->insert($data_to_insert);
    
    评论

报告相同问题?

悬赏问题

  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?
  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决