doupafu6980 2019-07-31 08:08
浏览 110

优化循环(对于许多数据集)以进行数据显示

I am using a php backend running on an apache webserver. I process data with python and have to display the data stored in mysql with plotly.js. When the data gets large, beating the data for the heatmaps into shape becomes problematic and generates an sigkill in the server.

I suspect this is from running a bunch of loops (and sometimes queries) when beating the data into shape. This is because I have a 3 dimensional matrix where the first dimension is for example the day and the other two dimensions are the x and y coordinates.

In the sample code below, I grab all the data, loop through them and assign the values to their respective coordinates in the matrix.

I could be wrong about my initial suspicions and maybe it is the size of the data downloaded from mysql that is causing my problems. At the last check one of the tables had about 200k rows and I have about 14 observables to be processed into heatmaps this way.

I have thought about using data frames but plotly.js needs data for heatmaps in a matrix form and data frames will not work.

The data retrieved from mysql looks like this:

value, time, x, y, day/round

30.1, 2019-07-02 18:49:53, 8, 1, 1

# Obtain the last field so I can initialize the dimension of my matrix return none if empty
    def get_last_field(self):
        do x

    # Grab the data from mysql
    def download_data(self, time = ''):
        download data

    # Parse the data into a matrix form
    def parse_data(self):
        first_dim = self.get_last_field()
        if first_dim is not None:
            observable_matrix = np.zeros([first_dim, self.x_dim, self.y_dim], dtype=np.float32)
            timestamp = np.chararray(first_dim, itemsize=20)
            day_of_production = np.chararray(first_dim, itemsize=7)
            day_of_production[:] = 'default'
            timestamp[:] = 'default'

            data = self.download_data()
            if(data is None or len(data) == 0):
                return None, None
            for dat in data:
                observable_matrix[dat[5] - 1, dat[2], dat[3]] = dat[0]
                timestamp[dat[5] - 1] = datetime.strptime(str(dat[1]), "%Y-%m-%d %H:%M:%S").strftime("%Y-%m-%d") 
                day_of_production[dat[5] - 1] = str(self.get_day_of_production(dat[1]))
                # Make a dictionary with the timestamp and data
            data = self.dict_with_timestamp(observable_matrix, timestamp, first_dim)
            return data, day_of_production
        else:
            return None, None

    def get_first_day_of_production(self):
        query = "SELECT `from` FROM round WHERE id=%(round_id)s"
        self.cursor.execute(query, self.params)
        day = self.cursor.fetchone()
        return day[0]   

    def get_day_of_production(self, current_time):
        diff = current_time.date() - self.first_day_of_production
        return int(diff.total_seconds() / (60 * 60 * 24))   

    def next_day(self, current_day):
        return current_day + timedelta(days = 1)

    # Dictionary with timestamp/round and data
    def dict_with_timestamp(self, data, timestamp, dim):
        updated_data = []
        for i in range(dim):
            updated_data.insert(i, (timestamp[i] , data[i]))

        return updated_data

Beating this data into heatmap causes an sigkill in the server and many times the response data for the frontend is truncated.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 ubuntu子系统密码忘记
    • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
    • ¥15 保护模式-系统加载-段寄存器
    • ¥15 电脑桌面设定一个区域禁止鼠标操作
    • ¥15 求NPF226060磁芯的详细资料
    • ¥15 使用R语言marginaleffects包进行边际效应图绘制
    • ¥20 usb设备兼容性问题
    • ¥15 错误(10048): “调用exui内部功能”库命令的参数“参数4”不能接受空数据。怎么解决啊
    • ¥15 安装svn网络有问题怎么办
    • ¥15 vue2登录调用后端接口如何实现