I'm building a two cron-jobs that regularly searches new tweets and instagram photos(and potentially more services) based on a tag.
The content is saved to a database and are later outputted to a webpage. This allows for faster loading and more importantly the ability to remove certain tweets so they are not displayed.
I want to make sure that no posts are saved twice in the database and am not sure what approach is best. Here are some options I consider:
- I use Laravel and has the ability to demand the postID to be unique, this would make the database refuse when I try to save already existing posts. This might render unnesseary sql-queries though.
- I could check the database for the latest saved post ID and stop the loop once I get to that post.
- Atleast in twitter, I can pass the parameter since_id, wich allows me to just get the latest posts. However I haven't found the same parameter in instagram and it wouldn't work for html-scraping either.