在Go中同时解析二进制文件中的记录

I have a binary file that I want to parse. The file is broken up into records that are 1024 bytes each. The high level steps needed are:

Read 1024 bytes at a time from the file.
Parse each 1024-byte "record" (chunk) and place the parsed data into a map or struct.
Return the parsed data to the user and any error(s).

I'm not looking for code, just design/approach help.

Due to I/O constraints, I don't think it makes sense to attempt concurrent reads from the file. However, I see no reason why the 1024-byte records can't be parsed using goroutines so that multiple 1024-byte records are being parsed concurrently. I'm new to Go, so I wanted to see if this makes sense or if there is a better (faster) way:

A main function opens the file and reads 1024 bytes at a time into byte arrays (records).
The records are passed to a function that parses the data into a map or struct. The parser function would be called as a goroutine on each record.
The parsed maps/structs are appended to a slice via a channel. I would preallocate the underlying array managed by the slice as the file size (in bytes) divided by 1024 as this should be the exact number of elements (assuming no errors).

I'd have to make sure I don't run out of memory as well, as the file can be anywhere from a few hundred MB up to 256 TB (rare, but possible). Does this make sense or am I thinking about this problem incorrectly? Will this be slower than simply parsing the file in a linear fashion as I read it 1024 bytes at a time, or will parsing these records concurrently as byte arrays perform better? Or am I thinking about the problem all wrong?

I'm not looking for code, just design/approach help.

Cross-posted on Software Engineering

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douquan9826 2017-04-06 17:43
关注
This is an instance of the producer-consumer problem, where the producer is your main function that generates 1024-byte records and the consumers should process these records and send them to a channel so they are added to the final slice. There are a few questions tagged producer-consumer and Go, they should get you started. As for what is fastest in your case, it depends on so many things that it is really not possible to answer. The best solution may be anywhere from a completely sequential implementation to a cluster of servers in which the records are moved around by RabbitMQ or something similar.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

在Go中同时解析二进制文件中的记录
2017-04-06 16:53

回答 1 已采纳 This is an instance of the producer-consumer problem, where the producer is your main function tha
C++中使用二进制文件保存和读取结构体的问题 c++
2018-03-22 11:41

回答 3 已采纳这里的问题是string，string类型浅拷贝的结果是两个指针指向同一块区域，那么析构的时候就会遇到double free了，你如果想整体将结构体写入文件，那么你需要确保没有复杂的结构对象，或者你自
VS2017 resource文件在git中为什么是二进制文件？ c++ git mfc
2022-03-22 17:16

回答 2 已采纳不是提交是二进制文件，是你diff，在git窗口内是这样的，你看下你仓库正常么？因为不知道是哪的问题，正常的话我告你怎么改
binlog:Go应用程序的快速二进制日志
2021-05-11 06:52

不带可变参数的字符串输出（记录程序将字符串的哈希值添加到二进制流中）会将您的时间缩短15纳秒。如果整数是常量，则每个附加类型的参数要花费〜10ns，如果参数不是常量，则需要花费20ns。对于空字符串，binlog....
如何在GoLang程序中运行二进制文件？
2016-11-18 05:53

回答 2 已采纳 When I'm looking at the source of the exec.Command() it doesnt return an error but only returns Cm
java怎么post请求同时发送二进制文件和json数据 java
2021-02-02 11:27

回答 4 已采纳不行吧，你也知道两个都要指定Content-Type才能解析，所以只能传一个，你可以body传文件，其他需要的参数加在url上
在Golang中读取二进制.fbx文件
2018-08-24 18:56

回答 2 已采纳 If you still interested in golang FBX reader here is my implementation https://github.com/o5h/fbx.
分析Windows二进制文件和嵌入式资源
2022-05-07 18:58

Ba1_Ma0的博客交叉引用很可能在实现一些很重要的东西 ctrl+f12，查找一下程序中存在的字符串看起来并没有什么有用的信息，在这种情况下，我们必须手动指定字符串的类型，打开ida的选项，选择字符串在linux二进制文件中看到一个...
无法在Docker容器中执行二进制文件，但从主机执行 docker
2019-03-15 18:08

回答 1 已采纳 It's because the binary is compiled using glibc and alpine using musl. I've got around with it to
将多个图像打包到GOLANG二进制文件中
2017-01-09 00:02

回答 1 已采纳 You could use the same tools you used for HTML files... I'm assuming you used something like: ht
如何解决Go项目中嵌套应用程序二进制文件中的依赖项？
2019-02-14 01:01

回答 2 已采纳 To follow up from my comment above: From https://golang.org/doc/code.html: Go programmers
java解析comtrade文件_一种COMTRADE二进制数据文件的快速解析方法与流程
2021-03-01 07:54

依然有光的博客本发明涉及电子系统录波领域，更具体地，涉及一种COMTRADE二进制数据文件的快速解析方法。背景技术：近年来,随着电子技术的快速发展,电力系统暂态录波明显向高采样率、连续稳态记录和海量存储的趋势发展，这种录波...
VB.net二进制文件读写操作 .net visual studio
2022-04-15 15:19

回答 1 已采纳如何：写入二进制文件 - Visual Basic | Microsoft Docs 详细了解：操作说明：在 Visual Basic 中
夜莺监控V6在MAC环境中二进制方式搭建
2023-05-17 01:30

weixin_47028810的博客前些天用云服务使用二进制方式构建过一次夜莺v6的版本，不过还想再折腾一下，打算开始重新在mac本地环境上再搭建一下。你可以获得到的知识点包含： - mac查看硬件信息 - Homebrew方式安装mariadb - 查看端口占用-...
【9】最完整的二进制插桩理论介绍和实例分析：指令流分析器+二进制文件脱壳器
2023-09-08 21:53

CC博士历险记的博客吐血整理二进制插桩理论及实例，实例部分的分析内容还不够完整，后期找时间补上！
没有解决我的问题, 去提问

悬赏问题

¥15 oracle集群安装出bug
¥15 关于#python#的问题：自动化测试
¥20 问题请教！vue项目关于Nginx配置nonce安全策略的问题
¥15 教务系统账号被盗号如何追溯设备
¥20 delta降尺度方法，未来数据怎么降尺度
¥15 c# 使用NPOI快速将datatable数据导入excel中指定sheet，要求快速高效
¥15 再不同版本的系统上，TCP传输速度不一致
¥15 高德地图点聚合中Marker的位置无法实时更新
¥15 DIFY API Endpoint 问题。
¥20 sub地址DHCP问题

在Go中同时解析二进制文件中的记录

1条回答 默认 最新

悬赏问题

1条回答默认最新