dsgd4654674 2014-10-30 20:52 采纳率: 0%
浏览 185
已采纳

Postgres - UPDATE随着时间的推移变慢

I have a table like this (more columns but these will do):

events
+----------+----------------+--------------------+------------------+------------------+---------+  
| event_id | user_ipaddress |   network_userid   |  domain_userid   | user_fingerprint | user_id |  
+----------+----------------+--------------------+------------------+------------------+---------+  
|        1 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
|        2 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
|        3 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
|        4 | 127.0.0.1      | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 |       2199066221 |         |  
+----------+----------------+--------------------+------------------+------------------+---------+

The table contains around 1M records. I'm trying to update all records to set the user_id.
I'm using a very simple PHP script for that.

I'm looping over each record with user_id = NULL and SELECT from the entire table to find existing user_id based on user_ipaddress, network_userid, domain_userid and/or user_fingerprint.

If nothing was found I will generate a unique user_id and UPDATE the record.
If a match was found I will UPDATE the record with the correspondent user_id.

The query looks like this:

UPDATE events SET user_id = 'abc' WHERE event_id = '1'

The SELECT part is super fast (~5ms).
The UPDATE part starts fast (~10ms) but becomes slower (~800ms) after a few hundred updates.

If I wait for around 10-20 minutes it's becomes fast again.

I'm running a PostgreSQL 9.3.3 on AWS RDS (db.m1.medium) with General Purpose SSD storage. I have indexes on all columns combined and individually.

I have played with FILLFACTOR and currently it's set to 70. I have tried to run VACUUM FULL events, but I never know if it finished (waited more than 1h). Also I've tried REINDEX TABLE events.

I'm the only one using this server.

Here's an EXPLAIN ANALYZE of the UPDATE query:

Update on events  (cost=0.43..8.45 rows=1 width=7479) (actual time=0.118..0.118 rows=0 loops=1)
  ->  Index Scan using events_event_id_idx on events  (cost=0.43..8.45 rows=1 width=7479) (actual time=0.062..0.065 rows=1 loops=1)
        Index Cond: (event_id = '1'::bpchar)
Total runtime: 0.224 ms

Any good ideas on how I can keep the query fast?

  • 写回答

3条回答 默认 最新

  • drsh30452 2014-11-01 11:51
    关注

    I found out that the problem was caused by the filesystem chosen for my RDS instance. I was running with General Purpose Storage (SSD). It apparently has some I/O limits. So the solution was to switch storage. Now I'm running the Provisioned IOPS Storage and the performance improved instantly.

    Also a solution could be to stick to the General Purpose Storage (SSD) and increase storage size, since that would increase I/O limits as well.

    Read more: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html#Concepts.Storage.GeneralSSD

    Thanks for all replies. And thanks to @Dan and @ArtemGr for pointing me in that direction.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥100 求数学坐标画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 自己瞎改改,结果现在又运行不了了
  • ¥15 链式存储应该如何解决
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站