I am working with an XML dump of featured Wikipedia articles (including revisions and extracted citations).
My current query joins three tables to return all the citations for all revisions of featured articles, sorted by page title, author, year, and timestamp, something like this:
____TIMESTAMP______PAGE_TITLE____AUTHOR____YEAR___TITLE_______
1___20110801...____AARDVARK______BLAND_____2010___MAJESTIC AARDVARKS
2___20110910...____AARDVARK______BLAND_____2010___MAJESTIC AARDVARKS
3___20120101...____AARDVARK______BLAND_____2012___AARDVARK BEHAVIOUR
4___20070601...____AARDVARK______SMITH_____2005___BREEDING HABITS OF
5___20090602...____AARDVARK______SMITH_____2005___BREEDING HABITS OF
Ideally, my query would return only the earliest instance [min(timestamp)] for each unique citation. In other words, I would like a query that returns just rows 1, 3, and 4. I still need duplicates in terms of page_title, author, year, as there are multiple citations per page and possibly several by the same author.
Thanks very much in advance for your help!