使用Go将文本文件从硬盘驱动器读入内存的最快方法是什么？

I just start using Go after years of using Perl and from initial tests seems like reading text file from a hard drive into hash is not as fast as Perl.

In Perl I use "File::Slurp" module and it helps reading file into memory (into string variable, array, or hash) really fast - in the limits of hard drive Read throughput.

I am not sure what is the best way by using Go to read e.g. 500MB CSV file with 10 columns into memory (into hash) where Key of a Hash is 1st column and Value is rest of 9 columns.

What is the fastest way to achieve this? Goal is to read and store into some Go memory variable as fast as Hard drive can deliver data.

This is one line from input file - there are around 20 million similar lines:

1341,2014-11-01 00:01:23.588,12000,AV7WN259SEH1,1133922,SingleOven/HCP/-PRODUCTION/-23C_30S,0xd8d2a106d44bea07,8665456.006,5456-02,3010-30 N- PHOTO,AV7WN259SEH1

Platform is Win 7 - i7 Intel processor with 16GB Ram. I can install Go on Linux as well if there are benefits in doing so.

Edit:

So one use case that is - load whole file into memory as fast as you can into 1 variable. Later I can scan that variable, split (all in memory) etc.

Another approach is to to store each line as key-value pair during load time (e.g. after X bites are passed or after \N character arrive).

To me - these 2 approaches can yield different performance results. But since I am very new to Golang - it will probably take me days to make best performance algorithm in Golang trying different techniques.

I would like to learn all possible ways to do above in Golang and also recommended ways. At this point I am no concerned about memory usage since this process will be repeated 10,000 times soon as first file processing is finished (each file will be erased from memory soon as processing is done). Files range from 50MB to 500MB. Since there are several thousands of files - any performance gain (even 1 sec gain per file) is significant overall gain.

I do not want to add complexity to the question about what will be done with data later but just want to learn about fastest way to read file from drive and store in hash. I will put more detailed benchmarks on my findings and also as I learn more about different ways to do it in Golang and as I hear more recommendations. I am hoping someone already did research on this topic.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douhe8981 2014-11-08 06:13
关注
ioutil.ReadFile is probably a good first start to read a whole file into memory. That being said, this sounds like a poor use of memory resources. The question asserts that File::Slurp is fast, but this is not general consensus for the particular task you're doing, that is, line-by-line processing.

The claim is that Perl is somehow doing things "fast". We can look at the source code to Perl's File::Slurp. It's not doing any magic, as far as I can tell. As Slade mentions in comments, it's just using sysopen and sysread, both of which eventually bottom out to plain operating system calls. Frankly, once you touch disk I/O, you've lost: your only hope is to touch it as few times as possible.

Given that your file is 500MB, and you have to read the all the bytes of the disk file anyway, and you have to a line-oriented pass to process each line, I don't quite see why there's a requirement to do this in two passes. Why turn this from what's fundamentally a one-pass algorithm into a two-pass algorithm?

Without you showing any other code, we can't really say if what you've done is fast or slow or not. Without measurement, we can't say anything substantive. Did you try writing the direct code with bufio.Scanner() first, and then measure performance?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

为什么文件读入后输出结果都是0？ c++ c语言
2022-05-08 11:33

回答 2 已采纳打断点看看line的内容，再就是看看sscanf的返回值是不是9，sscanf返回成功赋值的字段个数。
为什么我的文本文件读入有问题？ c++
2019-05-25 16:45

回答 1 已采纳字符型数组与字符直接赋值时可能会出现问题导致出现我所问的情况其实可以用字符串的连接来读入名字
汇编语言实现将数据从大到小排序，要求：文本读入数据1000个整数急急急开发语言
2019-03-26 11:13

回答 1 已采纳将读取的txt文件放在目录下 ``` data segment file db 'test.txt' num dw 1024 dup (0) buffer db 2000
Alluxio-基于内存的虚拟分布式存储系统
2022-02-27 21:17

Zda天天爱打卡的博客 Alluxio 是内存为中心的架构，以内存速度统一了数据访问速度，使得数据的访问速度能比现有方案快几个数量级,为大数据软件栈带来了显著的性能提升在大数据生态系统中，Alluxio 位于数据驱动框架或应用（如 ...
请问r语言读入csv格式文件为啥不行 r语言有问必答
2022-04-02 22:20

回答 2 已采纳如果文件在当前工作目录下，用相对路径，另外如果数据中有中文的话，添加参数fileEncoding='utf8'： getwd() mydata<-read.table(file='ba.csv'
读入文本文件并按格式输出 python
2022-05-29 11:30

回答 2 已采纳解法∶从文件中逐行读入，每行找到最大最小输出即可。代码比较简洁，速度快 with open('scores.txt', 'r') as f: titles = f.readline().spl
二进制文件是按照字节互换的方式存储数据的，将数据读入数组后，还按字节将高位、低位互换回来吗？开发语言
2022-06-29 09:26

回答 3 已采纳你如果是逐个字节来读取的，读取后如果想要恢复为原来的int型，那么就需要交换字节的位置；如果你是直接按整型读取的，就不需要交换位置
Go学习路线
2022-05-02 14:37

kgduu的博客今天在开发的时候，找不到合适的包，翻了好久github没有合适的，我发现有个文章开源的包很全，所以就记录下来了，也提供给大家了！...go-fits - FITS（灵活图像传输系统）格式图像和数据读取器 astrogo/...
C/C++或大文件操作，用什么方法最快 c++ c语言
2023-01-11 20:04

回答 11 已采纳在 C/C++ 中，用 fread 和 fwrite 函数读写大文件是最快的方法。它们采用了缓存机制，能够最大限度地减少系统调用次数。在读写大文件时，应该每次读写大量数据，而不是一次读写一个字节。在
两道与Java有关的问题（图片文件读入内存；字符串哈希值保存到文本文档） java 有问必答
2022-11-28 14:27

回答 2 已采纳 import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import ja
将txt文件中的数据读入数组中，为什么只能读第一行数据啊？ c语言有问必答
2021-12-24 23:11

回答 2 已采纳 #include <stdio.h> #include <stdlib.h> struct gene { int code; float Medium1;
Java原理性基础知识整理[详细]
2020-09-19 16:31

小锋学长生活大爆炸的博客文章目录@[toc]Java程序编译过程编译型和解析型语言命名规范编程风格大括号非C风格的数组声明阿里巴巴Java开发手册基本数据类型整数浮点数字符型布尔型运算符优先级访问控制符一些默认值注释类注释方法注释字段注释...
如何采用异常处理的方法读入CSV文件啊？ python 有问必答
2021-11-25 23:17

回答 1 已采纳你题目的解答代码如下： import csv try: with open(r"score.csv","r",newline="", encoding='utf-8') as fileObj:
《Linux多线程服务端编程：使用muduoC++网络库》学习笔记
2021-02-01 22:58

Owl丶的博客 boost库安装： apt-cache search boost...C++多线程对象的销毁可能会碰到竞态条件，解决办法是使用shared_ptr。 1.1.1 线程安全的定义 1.1.2 MutexLock与MutexLockGuard #include<bits/stdc++.h> #include<b
2021 Java面试真题集锦
2021-12-26 22:18

Gavin___Zhang的博客目录 … 1 大厂面试的基本流程 17 字节跳动 17 阿里 17 腾讯 18 ...用实例说明你在内存调优方面的经验 24 展示你在数据库调优方面的经验 25 总结前文说辞 26 准备项目说辞时，更可以准备后继面试官的问
编程新手导论（转载）
2012-01-22 10:26

叶广明_微信ye_guangming的博客第二部分导论，这一部分主要是关于编程的导论， (要懂得一点思想具备一点常识)《设计，编码，，与软工》（编程与思想）这一章解释了三种思想，原语，抽象，组合，，和软件开发的二个重要过程，，软件工程的相关...
没有解决我的问题, 去提问

悬赏问题

¥15 微信会员卡等级和折扣规则
¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
¥15 随身WiFi网络灯亮但是没有网络，如何解决？
¥15 gdf格式的脑电数据如何处理matlab
¥20 重新写的代码替换了之后运行hbuliderx就这样了
¥100 监控抖音用户作品更新可以微信公众号提醒
¥15 UE5 如何可以不渲染HDRIBackdrop背景
¥70 2048小游戏毕设项目
¥20 mysql架构，按照姓名分表
¥15 MATLAB实现区间[a,b]上的Gauss-Legendre积分

使用Go将文本文件从硬盘驱动器读入内存的最快方法是什么？

1条回答 默认 最新

悬赏问题

1条回答默认最新