I've got a really big collection of data in a postgres database where I'd like to nuke data past a particular age... but I do not want it nuking the latest iteration of data from any given location & site combination.
Basically, I've got a really big table that has a location (bigint)
, site (bigint)
, readdate (bigint)
, and a little accompanying data (note: there will be multiple entries for a given site, location, and readdate - but anything on the same readdate is considered part of the same scan, and needs to be kept for a given location).
Currently, I've just got it set to get rid of all old records... but the possibility exists that a particular site and location combination will stop giving out data for a while, and I'd like to preserve the final state if that happens. I'm doing the SQL queries from php, so I'm pretty sure I could hack together some highly ugly code that finds the latest readdate for any given site & location combination, then either deletes stuff younger than that for that location, or deletes based on the calender limit (whichever gives the lesser date), but I'd prefer to put the decision-making workload in the SQL query, rather than having to first get a list of all location, site, and max(readdate) entries, then iterate over them in php making individual delete queries.
My current query (which doesn't do what I want, as it deletes everything before $limit) is declared by:
$query="DELETE FROM votwdata WHERE readdate < '".$limit."';";
- any ideas for a good revision?