dsgd4654674 2014-10-30 20:52 采纳率: 0%
浏览 186
已采纳

Postgres - UPDATE随着时间的推移变慢

I have a table like this (more columns but these will do):

events
+----------+----------------+--------------------+------------------+------------------+---------+  
| event_id | user_ipaddress |   network_userid   |  domain_userid   | user_fingerprint | user_id |  
+----------+----------------+--------------------+------------------+------------------+---------+  
|        1 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
|        2 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
|        3 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
|        4 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
+----------+----------------+--------------------+------------------+------------------+---------+

The table contains around 1M records. I'm trying to update all records to set the user_id.
I'm using a very simple PHP script for that.

I'm looping over each record with user_id = NULL and SELECT from the entire table to find existing user_id based on user_ipaddress, network_userid, domain_userid and/or user_fingerprint.

If nothing was found I will generate a unique user_id and UPDATE the record.
If a match was found I will UPDATE the record with the correspondent user_id.

The query looks like this:

UPDATE events SET user_id = 'abc' WHERE event_id = '1'

The SELECT part is super fast (~5ms).
The UPDATE part starts fast (~10ms) but becomes slower (~800ms) after a few hundred updates.

If I wait for around 10-20 minutes it's becomes fast again.

I'm running a PostgreSQL 9.3.3 on AWS RDS (db.m1.medium) with General Purpose SSD storage. I have indexes on all columns combined and individually.

I have played with FILLFACTOR and currently it's set to 70. I have tried to run VACUUM FULL events, but I never know if it finished (waited more than 1h). Also I've tried REINDEX TABLE events.

I'm the only one using this server.

Here's an EXPLAIN ANALYZE of the UPDATE query:

Update on events  (cost=0.43..8.45 rows=1 width=7479) (actual time=0.118..0.118 rows=0 loops=1)
  ->  Index Scan using events_event_id_idx on events  (cost=0.43..8.45 rows=1 width=7479) (actual time=0.062..0.065 rows=1 loops=1)
        Index Cond: (event_id = '1'::bpchar)
Total runtime: 0.224 ms

Any good ideas on how I can keep the query fast?

  • 写回答

3条回答 默认 最新

  • drsh30452 2014-11-01 11:51
    关注

    I found out that the problem was caused by the filesystem chosen for my RDS instance. I was running with General Purpose Storage (SSD). It apparently has some I/O limits. So the solution was to switch storage. Now I'm running the Provisioned IOPS Storage and the performance improved instantly.

    Also a solution could be to stick to the General Purpose Storage (SSD) and increase storage size, since that would increase I/O limits as well.

    Read more: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#Concepts.Storage.GeneralSSD

    Thanks for all replies. And thanks to @Dan and @ArtemGr for pointing me in that direction.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 在不同的执行界面调用同一个页面
  • ¥20 基于51单片机的数字频率计
  • ¥50 M3T长焦相机如何标定以及正射影像拼接问题
  • ¥15 keepalived的虚拟VIP地址 ping -s 发包测试,只能通过1472字节以下的数据包(相关搜索:静态路由)
  • ¥20 关于#stm32#的问题:STM32串口发送问题,偶校验(even),发送5A 41 FB 20.烧录程序后发现串口助手读到的是5A 41 7B A0
  • ¥15 C++map释放不掉
  • ¥15 Mabatis查询数据
  • ¥15 想知道lingo目标函数中求和公式上标是变量情况如何求解
  • ¥15 关于E22-400T22S的LORA模块的通信问题
  • ¥15 求用二阶有源低通滤波将3khz方波转为正弦波的电路