处理csv的最快方法，bash vs php vs c / c ++处理速度[关闭]

I have a csv with 5M rows. I have an option to import them at mysql database and then loop the table with php.

db_class=new MysqlDb;
$db_class->ConnectDB();
$query="SELECT * FROM mails WHERE .....";
$result=mysqli_query(MysqlDb::$db, $query);
while($arr=mysqli_fetch_array($result))
{
    //db row here 
}

So I loop all the mails from the the table and process them. IF they contain some bad string, I delete them etc.

This works but is very slow to import 5M rows, is also very slow to loop all of them one by one and edit the rows (delete when they contain bad string).

I am thinking of a better solution for skipping php/mysql at all. I will process the .csv file, line by line and check if the current row contains a specific bad string. I can do that In pure php, like:

$file = file('file.csv');
while (($data = fgetcsv($file)) !== FALSE) {
  //process line
   $data[0];
}

This is the bash script I use to loop all lines of a file

while read line; do    
    sed -i '/badstring/d' ./clean.csv
done < bac.csv

While on python I do

with open("file.csv", "r") as ins:
    array = []
    for line in ins:
      //process line here

A bad line would be like

name@baddomain.com
name@domain (without extension)

etc I have a few criterias for what a bad line is, thats why I didn't bother posting it here.

However for very big files I must try to find a better solution. What do you guys recommend? Should I learn how to do it in c/c++ or bash. Bash I know a little already, so I can make it faster. Is c/+++ much faster than bash for this situation? OR I should stick with bash?

Thank you

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duan6301 2019-02-05 03:11
关注
As for PHP solution, you are looking for fgetcsv. The manual includes the example of iterating the CSV file.

Or, if you want to be fancy, you can go with league/csv library.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

编辑

预览

报告相同问题？

关注问题

基于近半年Twitter与Github趋势分析_12大分类500+ChatGPT最新开源GitHub存储库（涵盖ChatGPT开发全框架、全编程语言及教程）——每周更新
2023-05-19 03:15

小胡说人工智能的博客随着OpenAI的ChatGPT的爆火，短短数月，围绕着ChatGPT的各种应用层出不穷...这些资源包括不同编程语言代码实现、nlp知识、教程、开发框架、模型微调等丰富多样的内容，它们将有助于你更加深入地理解并运用ChatGPT技术。
使用 Amazon Bedrock + Claude 3 打造个性化智能编程助手
2024-03-26 11:40

亚马逊云开发者的博客最近，随着人工智能技术的迅速... 在日常工作中，我们发现许多客户经常需要使用 Amazon SDK 进行开发，通常涉及多种编程语言如 Go、Node.js 和 Rust，为此我们创建了”Bedrock-claude-codecoach”开源项目（项目链接：...
http://c.biancheng.net/
2023-07-11 00:40

mingo_敏的博客 http://c.biancheng.net/view/2343.html 标题是：C++ static_cast、dynamic_cast、const_cast和reinterpret_cast（四种类型转换运算符）http://c.biancheng.net/view/410.html 标题是：C++强制类型转换运算符...
Linux——LAMP架构搭建Linux操作系统、手工编译安装(Apache网站服务器、MySQL数据库服务器、PHP网页编程语言)
2021-03-24 00:41

孤岛上的笛的博客 LAMP是一个缩写词，具体包括Linux操作系统、Apache网站服务器、MySQL数据库服务器、PHP（或Perl、Python）网页编程语言 各组件的主要作用平台 Linux：作为LAMP架构的基础，提供用于支撑Web站
HVV 红队常用攻击方法（非常详细）零基础入门到精通，收藏这一篇就够了
2024-10-12 08:24

网络安全老哥--熬夜的博客使用FOFA等互联网资产收集工具直接搜索公司名称 5.C段扫描 rustscan：速度快； goby：图形化直观，支持漏洞验证，端口扫描； fscan：速度快，主机存活探测、端口扫描、常见服务的爆破。敏感信息收集 1.利用Google ...
hive+hbase学习手册
2019-09-03 15:20

尬聊码农的博客 hive学习手册一、hive入门手册 1.什么是数据仓库 1.1数据仓库概念对历史数据变化的统计，从而...（2）无法有效处理不同类型的数据（3）计算和处理能力不足 1.3 Hive介绍 Hbase支持快速的交互式的大数据应用 ...
LAMP源码安装（平台Linux，前台Apache，后台MySQL，中间连接PHP/Perl/Python）
2022-01-09 11:04

Just soso857的博客 LAMP是Linux操作系统+Apache网站服务器+MySQL数据库服务器+PHP（或Perl、Python）网页编程语言的首字母缩写。它们本身都是各自独立的程序，但组合到一起可用于搭建动态网站或者服务器的开源软件 2.各组件的主要作用...
linux命令行和shell脚本编程大全笔记
2022-01-28 00:56

weixin_47680367的博客功能键F1生成虚拟控制台1，F2键生成虚拟控制台2，F3键生成虚拟控制台3，F4键生成虚拟控制台4，依次类推第3章基本的bash shell命令 [root@iZbp16mm3xbwen89azh9ffZ ~]# cat /etc/passwd root:x:0:0:root:/root:/...
免费开源代码安全卫士开源集成到gitlab或者ci/cd进行代码漏洞扫描，代码扫描工具：sonar、fireline、coverity、fortify、blackduck对比
2024-12-11 03:22

代码讲故事的博客免费开源代码安全卫士开源集成到gitlab或者ci/cd进行代码漏洞扫描，代码扫描工具：sonar、fireline、coverity、fortify、blackduck对比。软件开发中，如何根据项目以来的第三方组件名称和版本，快速联网搜索相关CVE...
PHP8 编程提示（一）
2024-07-19 04:10

绝不原创的飞龙的博客 PHP 8 代表了 PHP 核心开发团队为最大化提高核心语言效率所做的工作的巅峰。只要迁移到 PHP 8，您的应用程序代码将立即看到速度提升，同时内存占用也会更小。此外，在 PHP 8 中，开发人员会注意到大量的工作已经投入...
没有解决我的问题, 去提问

处理csv的最快方法，bash vs php vs c / c ++处理速度[关闭]

1条回答 默认 最新

1条回答默认最新