douxie5930 2014-01-29 22:57
浏览 56

如果我使用golang编写NoSQL数据库但要在其上运行Hadoop mapreduce,我应该遇到什么困难?

I would like to build a distributed NoSQL database or key-value store using golang, to learn golang and practice distribute system knowledge I've learnt from school. The target use case I can think of is running MapReduce on top of it, and implement a HDFS-compatible "filesystem" to expose the data to Hadoop, similar to running Hadoop on Ceph and Amazon S3.

My question is, what difficulties should I expect to integrate such an NoSQl database with Hadoop? Or integrate with other languages (e.g., providing Ruby/Python/Node.js/C++ APIs?) if I use golang to build the system.

  • 写回答

1条回答 默认 最新

  • dpt1712 2014-01-30 22:38
    关注

    Ok, I'm not much of a Hadoop user so I'll give you some more general lessons learned about the issues you'll face:

    1. Protocol. If you're going with REST Go will be fine, but expect to find some gotchas in the default HTTP library's defaults (not expiring idle keepalive connections, not necessarily knowing when a reader has closed a stream). But if you want something more compact, know that: a. the Thrift implementation for Go, last I checked, was lacking and relatively slow. b. Go has great support for RPC but it might not play well with other languages. So you might want to check out protobuf, or work on top the redis protocol or something like that.

    2. GC. Go's GC is very simplistic (STW, not generational, etc). If you plan on heavy memory caching in the orders of multiple Gs, expect GC pauses all over the place. There are techniques to reduce GC pressure but the straight forward Go idioms aren't usually optimized for that.

    3. mmap'ing in Go is not straightforward, so it will be a bit of a struggle if you want to leverage that.

    4. Besides slices, lists and maps, you won't have a lot of built in data structures to work with, like a Set type. There are tons of good implementations of them out there, but you'll have to do some digging up.

    5. Take the time to learn concurrency patterns and interface patterns in Go. It's a bit different than other languages, and as a rule of thumb, if you find yourself struggling with a pattern from other languages, you're probably doing it wrong. A good talk about Go concurrency is this one IMHO http://www.youtube.com/watch?v=QDDwwePbDtw

    A few projects you might want to have a look at:

    1. Groupcache - a distributed key/value cache written in Go by Brad Fitzpatrick for Google's own use. It's a great implementation of a simple yet super robust distributed system in Go. https://github.com/golang/groupcache and check out Brad's presentation about it: http://talks.golang.org/2013/oscon-dl.slide

    2. InfluxDB which includes a Go based version of the great Raft algorithm: https://github.com/influxdb/influxdb

    3. My own humble (pretty dead) project, a redis compliant database that's based on a plugin architecture. My Go has improved since, but it has some nice parts, and it includes a pretty fast server for the redis protocol. https://bitbucket.org/dvirsky/boilerdb

    评论

报告相同问题?

悬赏问题

  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?