I am using a php backend running on an apache webserver. I process data with python and have to display the data stored in mysql with plotly.js. When the data gets large, beating the data for the heatmaps into shape becomes problematic and generates an sigkill in the server.
I suspect this is from running a bunch of loops (and sometimes queries) when beating the data into shape. This is because I have a 3 dimensional matrix where the first dimension is for example the day and the other two dimensions are the x and y coordinates.
In the sample code below, I grab all the data, loop through them and assign the values to their respective coordinates in the matrix.
I could be wrong about my initial suspicions and maybe it is the size of the data downloaded from mysql that is causing my problems. At the last check one of the tables had about 200k rows and I have about 14 observables to be processed into heatmaps this way.
I have thought about using data frames but plotly.js needs data for heatmaps in a matrix form and data frames will not work.
The data retrieved from mysql looks like this:
value, time, x, y, day/round
30.1, 2019-07-02 18:49:53, 8, 1, 1
# Obtain the last field so I can initialize the dimension of my matrix return none if empty
def get_last_field(self):
do x
# Grab the data from mysql
def download_data(self, time = ''):
download data
# Parse the data into a matrix form
def parse_data(self):
first_dim = self.get_last_field()
if first_dim is not None:
observable_matrix = np.zeros([first_dim, self.x_dim, self.y_dim], dtype=np.float32)
timestamp = np.chararray(first_dim, itemsize=20)
day_of_production = np.chararray(first_dim, itemsize=7)
day_of_production[:] = 'default'
timestamp[:] = 'default'
data = self.download_data()
if(data is None or len(data) == 0):
return None, None
for dat in data:
observable_matrix[dat[5] - 1, dat[2], dat[3]] = dat[0]
timestamp[dat[5] - 1] = datetime.strptime(str(dat[1]), "%Y-%m-%d %H:%M:%S").strftime("%Y-%m-%d")
day_of_production[dat[5] - 1] = str(self.get_day_of_production(dat[1]))
# Make a dictionary with the timestamp and data
data = self.dict_with_timestamp(observable_matrix, timestamp, first_dim)
return data, day_of_production
else:
return None, None
def get_first_day_of_production(self):
query = "SELECT `from` FROM round WHERE id=%(round_id)s"
self.cursor.execute(query, self.params)
day = self.cursor.fetchone()
return day[0]
def get_day_of_production(self, current_time):
diff = current_time.date() - self.first_day_of_production
return int(diff.total_seconds() / (60 * 60 * 24))
def next_day(self, current_day):
return current_day + timedelta(days = 1)
# Dictionary with timestamp/round and data
def dict_with_timestamp(self, data, timestamp, dim):
updated_data = []
for i in range(dim):
updated_data.insert(i, (timestamp[i] , data[i]))
return updated_data
Beating this data into heatmap causes an sigkill in the server and many times the response data for the frontend is truncated.