dongyinzheng6572 2019-02-05 11:05
浏览 73
已采纳

处理csv的最快方法,bash vs php vs c / c ++处理速度[关闭]

I have a csv with 5M rows. I have an option to import them at mysql database and then loop the table with php.

db_class=new MysqlDb;
$db_class->ConnectDB();
$query="SELECT * FROM mails WHERE .....";
$result=mysqli_query(MysqlDb::$db, $query);
while($arr=mysqli_fetch_array($result))
{
    //db row here 
}

So I loop all the mails from the the table and process them. IF they contain some bad string, I delete them etc.

This works but is very slow to import 5M rows, is also very slow to loop all of them one by one and edit the rows (delete when they contain bad string).

I am thinking of a better solution for skipping php/mysql at all. I will process the .csv file, line by line and check if the current row contains a specific bad string. I can do that In pure php, like:

$file = file('file.csv');
while (($data = fgetcsv($file)) !== FALSE) {
  //process line
   $data[0];
}

This is the bash script I use to loop all lines of a file

while read line; do    
    sed -i '/badstring/d' ./clean.csv
done < bac.csv

While on python I do

with open("file.csv", "r") as ins:
    array = []
    for line in ins:
      //process line here

A bad line would be like

name@baddomain.com
name@domain (without extension)

etc I have a few criterias for what a bad line is, thats why I didn't bother posting it here.

However for very big files I must try to find a better solution. What do you guys recommend? Should I learn how to do it in c/c++ or bash. Bash I know a little already, so I can make it faster. Is c/+++ much faster than bash for this situation? OR I should stick with bash?

Thank you

  • 写回答

1条回答 默认 最新

  • duan6301 2019-02-05 11:11
    关注

    As for PHP solution, you are looking for fgetcsv. The manual includes the example of iterating the CSV file.

    Or, if you want to be fancy, you can go with league/csv library.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 想用@vueuse 把项目动态改成深色主题,localStorge里面的vueuse-color-scheme一开始就给我改成了dark,不知道什么原因(相关搜索:背景颜色)
  • ¥15 flask实现搜索框访问数据库
  • ¥15 mrk3399刷完安卓11后投屏调试只能显示一个设备
  • ¥20 白日门传奇少一个启动区服和启动服务器的快捷键,东西都是全的 , 他们说套一个出来就行了 但我就是弄不好,谁看看,
  • ¥100 如何用js写一个游戏云存档
  • ¥15 ansys fluent计算闪退
  • ¥15 有关wireshark抓包的问题
  • ¥15 需要写计算过程,不要写代码,求解答,数据都在图上
  • ¥15 向数据表用newid方式插入GUID问题
  • ¥15 multisim电路设计