I'm investigating about how to scrape a url in the "best and most recent way". I intend to retrieve one image from a url. First from a link tag <link rel="image_src" href="http://stackoverflow.com/images/logo.gif" />
, then from an og tag... and maybe, if I still got nothing, try to get the first big enough img. Put differently, a light version of facebook on thumbnail-retrieving.
So I'm reading stuff on the internet, and when I thought I had found what I need it appeared the solution was pretty old (like 5-6y old http://www.lightspeedretail.com/cloud/blog/2007/08/scraping-links-with-php/) : solution using cURL
, DOMDocument
, and XPath
basically. Then I would just have to work on the image url I got, store a few versions of it in different sizes for instance. But I'm fine for this part.
Would there be something better than this solution ? Ideally an example for the link tag would be fantastic.