I have a MySQL table containing
- (100 million) Lat/Lng coordinates of locations in America
- Number of people living within a square mile radius of that location
Question: After generating and overlaying the heatmap on a Google maps or Openstreetmaps, the number of people living with a square mile radius has to be determined at any point on the map wherever the mouse cursor is positioned at. (Simple averaging using the neighboring data points can be used)
How do you generate such a heatmap? Is it recommended to use Mapreduce?
Initial Thoughts
Heatmap has to be pre-rendered serverside
Downloading all the necessary points onto the browser then generating the heatmap clientside can be a problem: Large number of coordinates have to be retrieved from the database (heavy database load) AND transferred to the browser (large dataset), furthermore the browser have to process the large number of points to generate the heatmap. This will be far too slow, so I suppose we have to pre-render the heatmap serverside and retrieve the heatmap tiles to overlap on the map.
Better Alternative: Process serverside, render clientside
Instead of fully rendering the heatmap server side and serving the image tiles, we can simplify the data by clustering points that are close together into a single point and a weight/bias, then send these smaller dataset of simplified point data (via JSON) to the browser for clientside rendering of the heatmap (using heatmapjs). Sending lat/lng points instead of image tiles will make the application/website more responsive.
This will also allow us to read the heatmap intensity values directly from Javscript and implement the hover popup box (see image above) in Javascript/jQuery. Not sure how to do this if we instead had sent the heatmap tiles to the browser.
Map/reduce?
We probably need to split up the job (processing 100 million data points) into smaller chunks and generate the heatmap across several nodes. This will be done once a month. Having several nodes generate the heatmap makes me think of mapreduce and hadoop, although I have not used them before.
Existing solutions
gheat generates the heatmap on-demand, so it will be too slow for our purpose. However we still need a tile server for the heatmap tiles that we pre-render, maybe we can use an OSM tile server.