I am writing an app that will work in read-only mode on client's DB, so I can't change anything in the DB structure, but I also can't be blamed for the structure :) And I'm not sure whether this should be solved on the DB level, or higher...
I have a PostgreSQL table events
that can be simplified to something like this:
id UUID PRIMARY KEY
owner_id INT
previous_event_id UUID FOREIGN KEY REFERENCES (id) ON events
created_at TIMESTAMP
As you can see, events are connected by previous_event
field. It's not redundant, because the same user can have multiple event "threads" simultaneously, so it's not just a matter of ordering by creation time.
Now I must extract from it the separate chains of events.
Chain of events in our terminology is a set of events that have the same owner_id
, they are connected one-to-another by previous_event_id
and the difference between their created_at
timestamps is less than X minutes
.
I'm pretty sure checking one by one for successor/ancestor and time difference would take ages and thousands of DB queries per second. What would be the best way to do it? The scenarios I see are:
Without splitting to chains yet, browse the whole events thread for specific user, not looking at timestamp differences (practically: for specified
user_id
get all events, connected with each other byprevious_event_id
).Get the number of event chains for
user_id == A
(practically: for specificuser_id
, get the number of events which haveprevious_event_id
NULL (which is trivial), OR which havecreated_at
timestamp bigger than (previous event's timestamp+X), where previous event is specified byprevious_event_id
).Get the event chain started by the event with
id == B
(practically: get all events that have the sameuser_id
as specified event, are connected byprevious_event_id
and havecreated_at
difference smaller than X).
I'm not sure about the target amount of data, so I can't assume anything. The only thing that came to my mind was fetching data in batches and then try to process them in the memory, finding the correct chain end (and fetching more data, if not found), but that seems ugly to me...