duanshai4484 2010-05-07 05:04
浏览 27
已采纳

近地图架构

Looking at http://www.nearmap.com/,

Just wondering if you can approximate how much storage is needed to store the images? (NearMap’s monthly city PhotoMaps are captured at 3cm, 5cm, 7.5cm, or 10cm resolution)

And what kind of systems/architecture is suitable to deliver those data/images? (say you are not Google, and want to implement this from scratch, what would you do? )

ie. would you store the images in Hadoop, and use apache/php/memcache to deliver etc ?

  • 写回答

1条回答 默认 最新

  • dsjgk330337 2010-05-14 03:48
    关注

    It's pretty hard to estimate how much space is required without being able to determine the compression ratio. Simply put, if aerial photographs of houses compress well, then it can significantly change how much data needs to be stored.

    But, in the interests of math we can try to figure out what is required.

    So, if each pixel measures 3cm by 3cm they cover 9cm^2. A quick wikipedia search tells us that London is about 1700km^2, and at 10 billion cm^2 per km^2, is 17,000,000,000,000 cm^2. This mean that we need 1,888,888,888,888 pixels to cover London at a resolution of 3cm. Putting this into bytes, at 4 bytes per pixel, is about 7000 GiB. If you get 50% compression, that drops it down to 3500GiB for London. Multiply this out by every city you want to cover to get an idea for what kind of data storage you will need.

    Delivering the content is simple compared to gathering it. Since this is an embarrassingly parallel solution a share-nothing cluster with an appropriate front-end to route traffic to the right nodes would probably be easiest way to implement it. This is because the nodes don't have to maintain state or communicate with each other. The ideal method would depend on how much data you are pushing through, if you do push enough data it might be worthwhile to implement your own webserver that just responds to HTTP GETs.

    I'm not sure a distributed FS would be the best way to distribute things since you'd have to spend a significant amount of time trying to pull data from somewhere else in the cluster.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题