duannuan0074 2014-11-07 23:05
浏览 109

使用Go将文本文件从硬盘驱动器读入内存的最快方法是什么?

I just start using Go after years of using Perl and from initial tests seems like reading text file from a hard drive into hash is not as fast as Perl.

In Perl I use "File::Slurp" module and it helps reading file into memory (into string variable, array, or hash) really fast - in the limits of hard drive Read throughput.

I am not sure what is the best way by using Go to read e.g. 500MB CSV file with 10 columns into memory (into hash) where Key of a Hash is 1st column and Value is rest of 9 columns.

What is the fastest way to achieve this? Goal is to read and store into some Go memory variable as fast as Hard drive can deliver data.

This is one line from input file - there are around 20 million similar lines:

1341,2014-11-01 00:01:23.588,12000,AV7WN259SEH1,1133922,SingleOven/HCP/-PRODUCTION/-23C_30S,0xd8d2a106d44bea07,8665456.006,5456-02,3010-30 N- PHOTO,AV7WN259SEH1

Platform is Win 7 - i7 Intel processor with 16GB Ram. I can install Go on Linux as well if there are benefits in doing so.

Edit:

So one use case that is - load whole file into memory as fast as you can into 1 variable. Later I can scan that variable, split (all in memory) etc.

Another approach is to to store each line as key-value pair during load time (e.g. after X bites are passed or after \N character arrive).

To me - these 2 approaches can yield different performance results. But since I am very new to Golang - it will probably take me days to make best performance algorithm in Golang trying different techniques.

I would like to learn all possible ways to do above in Golang and also recommended ways. At this point I am no concerned about memory usage since this process will be repeated 10,000 times soon as first file processing is finished (each file will be erased from memory soon as processing is done). Files range from 50MB to 500MB. Since there are several thousands of files - any performance gain (even 1 sec gain per file) is significant overall gain.

I do not want to add complexity to the question about what will be done with data later but just want to learn about fastest way to read file from drive and store in hash. I will put more detailed benchmarks on my findings and also as I learn more about different ways to do it in Golang and as I hear more recommendations. I am hoping someone already did research on this topic.

  • 写回答

1条回答 默认 最新

  • douhe8981 2014-11-08 06:13
    关注

    ioutil.ReadFile is probably a good first start to read a whole file into memory. That being said, this sounds like a poor use of memory resources. The question asserts that File::Slurp is fast, but this is not general consensus for the particular task you're doing, that is, line-by-line processing.

    The claim is that Perl is somehow doing things "fast". We can look at the source code to Perl's File::Slurp. It's not doing any magic, as far as I can tell. As Slade mentions in comments, it's just using sysopen and sysread, both of which eventually bottom out to plain operating system calls. Frankly, once you touch disk I/O, you've lost: your only hope is to touch it as few times as possible.

    Given that your file is 500MB, and you have to read the all the bytes of the disk file anyway, and you have to a line-oriented pass to process each line, I don't quite see why there's a requirement to do this in two passes. Why turn this from what's fundamentally a one-pass algorithm into a two-pass algorithm?

    Without you showing any other code, we can't really say if what you've done is fast or slow or not. Without measurement, we can't say anything substantive. Did you try writing the direct code with bufio.Scanner() first, and then measure performance?

    评论

报告相同问题?

悬赏问题

  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了
  • ¥100 监控抖音用户作品更新可以微信公众号提醒
  • ¥15 UE5 如何可以不渲染HDRIBackdrop背景
  • ¥70 2048小游戏毕设项目
  • ¥20 mysql架构,按照姓名分表
  • ¥15 MATLAB实现区间[a,b]上的Gauss-Legendre积分