I have a table like this (more columns but these will do):
events
+----------+----------------+--------------------+------------------+------------------+---------+
| event_id | user_ipaddress | network_userid | domain_userid | user_fingerprint | user_id |
+----------+----------------+--------------------+------------------+------------------+---------+
| 1 | 127.0.0.1 | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 | 2199066221 | |
| 2 | 127.0.0.1 | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 | 2199066221 | |
| 3 | 127.0.0.1 | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 | 2199066221 | |
| 4 | 127.0.0.1 | 000d7d9e-f3cb-4a08 | 26dc9870c3572519 | 2199066221 | |
+----------+----------------+--------------------+------------------+------------------+---------+
The table contains around 1M records. I'm trying to update all records to set the user_id
.
I'm using a very simple PHP script for that.
I'm looping over each record with user_id = NULL
and SELECT
from the entire table to find existing user_id
based on user_ipaddress
, network_userid
, domain_userid
and/or user_fingerprint
.
If nothing was found I will generate a unique user_id
and UPDATE
the record.
If a match was found I will UPDATE
the record with the correspondent user_id
.
The query looks like this:
UPDATE events SET user_id = 'abc' WHERE event_id = '1'
The SELECT
part is super fast (~5ms).
The UPDATE
part starts fast (~10ms) but becomes slower (~800ms) after a few hundred updates.
If I wait for around 10-20 minutes it's becomes fast again.
I'm running a PostgreSQL 9.3.3 on AWS RDS (db.m1.medium) with General Purpose SSD storage. I have indexes on all columns combined and individually.
I have played with FILLFACTOR
and currently it's set to 70
. I have tried to run VACUUM FULL events
, but I never know if it finished (waited more than 1h). Also I've tried REINDEX TABLE events
.
I'm the only one using this server.
Here's an EXPLAIN ANALYZE
of the UPDATE
query:
Update on events (cost=0.43..8.45 rows=1 width=7479) (actual time=0.118..0.118 rows=0 loops=1)
-> Index Scan using events_event_id_idx on events (cost=0.43..8.45 rows=1 width=7479) (actual time=0.062..0.065 rows=1 loops=1)
Index Cond: (event_id = '1'::bpchar)
Total runtime: 0.224 ms
Any good ideas on how I can keep the query fast?