生成人口热图：Mapreduce？

I have a MySQL table containing

(100 million) Lat/Lng coordinates of locations in America
Number of people living within a square mile radius of that location

Question: After generating and overlaying the heatmap on a Google maps or Openstreetmaps, the number of people living with a square mile radius has to be determined at any point on the map wherever the mouse cursor is positioned at. (Simple averaging using the neighboring data points can be used)

How do you generate such a heatmap? Is it recommended to use Mapreduce?

enter image description here

Initial Thoughts

Heatmap has to be pre-rendered serverside

Downloading all the necessary points onto the browser then generating the heatmap clientside can be a problem: Large number of coordinates have to be retrieved from the database (heavy database load) AND transferred to the browser (large dataset), furthermore the browser have to process the large number of points to generate the heatmap. This will be far too slow, so I suppose we have to pre-render the heatmap serverside and retrieve the heatmap tiles to overlap on the map.

Better Alternative: Process serverside, render clientside

Instead of fully rendering the heatmap server side and serving the image tiles, we can simplify the data by clustering points that are close together into a single point and a weight/bias, then send these smaller dataset of simplified point data (via JSON) to the browser for clientside rendering of the heatmap (using heatmapjs). Sending lat/lng points instead of image tiles will make the application/website more responsive.

This will also allow us to read the heatmap intensity values directly from Javscript and implement the hover popup box (see image above) in Javascript/jQuery. Not sure how to do this if we instead had sent the heatmap tiles to the browser.

Map/reduce?

We probably need to split up the job (processing 100 million data points) into smaller chunks and generate the heatmap across several nodes. This will be done once a month. Having several nodes generate the heatmap makes me think of mapreduce and hadoop, although I have not used them before.

Existing solutions

gheat generates the heatmap on-demand, so it will be too slow for our purpose. However we still need a tile server for the heatmap tiles that we pre-render, maybe we can use an OSM tile server.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douqilai4263 2013-03-22 14:57
关注
To answer this we must first consider the sorts of problems map/reduce is well suited for. The best problems for map/reduce are those that can be broken down into small sub-problems that can be solved separately. A good analogy to think about this class of problem might be to consider the SQL GROUP BY construct, which effectively breaks a resultset into multiple chunks and computes an aggregate function on each: if you can imagine solving a problem via a GROUP BY (dataset size notwithstanding) then it is probably a good fit for map/reduce.

Your specific problem requires partitioning the data into geospatial regions and then computing some sort of aggregate for each of these regions. You'll then render these regions as two-dimensional tile images that can be overlaid on a Google Map.

A natural way to approach this then would be to start with a map function that accepts a stream of the rows from your data source, which consist of a geospatial point (lat/long) and a count. The contract for a map function is to emit tuples of the form (key, value), so in this case your mapper would need to "simplify" the point to create a key -- that is, to reduce its accuracy so that several neighboring points will share the same value -- and return that value along with the population at the current point. Here is some pseudocode for this:

function map(row): key = simplify_point(row.point) # implementation of this function TBD by you emit(key, row.population_count)

This will produce an intermediate dataset with items like the following:

| key | value | | 37.78,-122.43 | 2303 | | 37.78,-122.43 | 2009 | | 37.78,-122.43 | 3001 | | 37.78,-122.43 | 1238 | | 37.79,-122.43 | 1343 | | 37.79,-122.43 | 3005 | | 37.79,-122.43 | 2145 | | 37.79,-122.43 | 1536 |

Notice that each distinct key now has multiple values associated with it. The task of the reduce function is to take a set of values with the same key and produce a single value that represents that whole group of the data. Without knowing the details of your problem at hand, I'm going to assume that it's sufficient to determine the total population in each group, which we can achieve by simply adding together all of the values. A reduce function receives a key and a list of all of the values that had that key in the mapper's output, so our reducer could look as simple as this (again, in pseudocode):

function reduce(key, population_counts): sum = 0 for value in population_counts: sum = sum + value emit(key, sum)

For the example resultset above, this would result in the following final result:

| key | value | | 37.78,-122.43 | 8551 | | 37.79,-122.43 | 8029 |

You could then take this smaller set of points and values and render them as areas of different colors on a map, thus creating a visual heatmap.

Although I've used simple integer counts here for simplicity, in practice any type can be used as the value, so you can use instances of a particular class, or arrays, or any other value you can produce given a single row of data at a time. In your screenshot you show a hovertip that gives the number of records that were merged to produce the given datapoint, which you could do by having your reducer not only sum but also simultaneously count the rows, and return both together in some sort of object or data structure.

The above outlines the logical workflow for a map/reduce operation and describes one way to use map/reduce to create a heatmap. I'm sure I didn't solve your problem exactly, but if you can frame your problem within the workflow I described above then it could be a good fit for a map/reduce solution. I've also focused on the theory of map/reduce rather than the specific implementation in Hadoop, but hopefully you can easily map the concepts I've described onto the constructs provided by Hadoop.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

生成人口热图：Mapreduce？ hadoop mapreduce php
2013-03-22 05:03

回答 1 已采纳 To answer this we must first consider the sorts of problems map/reduce is well suited for. The bes
关于#大数据#的问题，如何解决？ hive java 大数据
2023-02-21 09:58

回答 2 已采纳你去看一下这个job的日志报的什么错误
idea里面运行mapreduce报这个错误，如何解决？ java mapreduce
2023-01-24 08:24

回答 1 已采纳望采纳！！新年快乐！！数据表中设置了主键(Primary Key)，而主键对应的值是不允许重复的。错误提示为：你插入的记录与数据表中原有记录的主键重复了(Duplicate)。所以插入失败。
大数据实验5实验报告：MapReduce 初级编程实践
2022-06-20 17:11

【MapReduce初级编程实践】是大数据处理中的一项基础任务，主要应用于大规模数据集的并行计算。在这个实验中，我们关注的是如何利用MapReduce来实现文件的合并与去重操作。MapReduce是一种分布式计算模型，由Google...
MapReduce分析气象数据 java linux mapreduce
2022-06-19 22:26

回答 3 已采纳是只能使用mapreduce写吗，你把他整成sql，操作不是简单多了吗
debug调试无法运行 Method threw.Cannot evaluate org.apache.hadoop.mapreduce.Job.toString() hadoop mapreduce 大数据有问必答
2023-04-05 23:37

回答 2 已采纳参考这篇文章试下https://blog.csdn.net/weixin_37895026/article/details/125660368
MapReduce Unable to initialize MapOutputCollector hadoop java mapreduce 有问必答
2022-04-22 14:34

回答 2 已采纳 public class StudentScore implements Serializable, WritableComparable<StudentScore> 类要实现Seria
大数据技术原理及应用课实验5 :MapReduce初级编程实践
2024-01-16 18:48

**大数据技术原理及应用——MapReduce初级编程实践** MapReduce是一种分布式计算模型，由Google提出，主要用于处理和生成大规模数据集。在这个实验中，我们将学习如何利用MapReduce编程解决实际问题，包括数据去重...
MapReduce，hadoop,eclipse hadoop java mapreduce
2022-12-15 20:47

回答 1 已采纳这些都是日志，包含处理进度、数据处理的一些统计信息，比如数据条数、所占空间大小等。
MapReduce中执行HTMLUnit 报nosuchfielderror：INSTANCE html5 mapreduce
2015-09-01 09:17

回答 1 已采纳 http://blog.csdn.net/izgnaw/article/details/18045879
Mapreduce集思功能实现，想不出啦实在 java mapreduce
2023-03-11 23:23

回答 2 已采纳链接：https://pan.baidu.com/s/1Tg-5DX0uuyM1m_mqLOV_JQ提取码：7alt
大数据实验四-MapReduce编程实践
2024-04-03 11:45

使用MapReduce实现多个文本文件中WordCount词频统计功能，实验编写Map处理逻辑、编写Reduce处理逻辑、编写main方法。二．实验目的 1、通过实验掌握基本的MapReduce编程方法。 2、实现统计HDFS系统中多个文本文件中...
MapReduce中的Golang GlobalSign mgo查询 mapreduce mongodb
2019-07-17 12:44

回答 2 已采纳 MapReduce is a function of the Query struct returned by Find. So in order to apply your MapReduce
mapreduce:用于罗马尼亚大数据研讨会的 Wordcount MapReduce 示例
2021-05-31 06:02

在“mapreduce:用于罗马尼亚大数据研讨会的 Wordcount MapReduce 示例”中，我们将探讨如何使用Java实现这一经典案例，以及它在大数据处理中的应用。 MapReduce的工作流程如下： 1. **映射（Map）阶段**：在这个...
面向MapReduce的大数据分类模型及算法.pdf
2021-07-07 23:14

面向MapReduce的大数据分类模型及算法.pdf
没有解决我的问题, 去提问

悬赏问题

¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）
¥15 Windows11, backspace, enter, space键失灵

生成人口热图：Mapreduce？

Initial Thoughts

1条回答 默认 最新

悬赏问题

1条回答默认最新