i'm working on a webpage indexing robot and using php and mysql. one of the tables will be to store the list of webpages. i want that table to have an integer based id
column (so that i can join it to other tables more quickly, and so that other tables can reference it without storing the complete url in a column), but i also want the url
column to remain unique.
the reason this question has come up is that if my script reads a web page that has 250 links i'm not sure the best way to add them to the webpages table without creating duplicates. i could loop the 250 links that i found and query the table for each of them to make sure they don't exist, and then insert the ones that do - which would require at least 250 queries.
however, if i could have mysql enforce the url
to be a unique column i could simply do one insert ignore
, statement with all of the links. this would keep the existing records from being duplicated while adding new ones. if i had this type of table with an integer id column that is unique / primary and auto-incremented, could i also specify that the url column be unique?
id (auto inc) url added
----------------------------------------------------------------------
1 http://site.com/page-a/ 2010-01-01 01:23:34
2 http://site.com/page-b/ 2010-01-01 01:23:34
3 http://site.com/page-c/ 2010-01-01 01:23:34
another option that has crossed my mind is setting the url to be a unique primary key, and then manually incrementing the id
column with php / mysql statements:
url (unique, primary) id added
-------------------------------------------------------------
http://site.com/page-a/ 1 2010-01-01 01:23:34
http://site.com/page-b/ 2 2010-01-01 01:23:34
http://site.com/page-c/ 3 2010-01-01 01:23:34
it seems like that would have the benefit of allowing mysql to enforce a unique url column (and thus allowing my insert ignore
method), and also allowing me to join on an integer. the only weird thing is having to rely on my script to add and increment the id column, but i may be willing to do that if there is not a better option.
what are my options?