So I am using curl to get the first image listed on a page if no og data exists. I then save the image locally to a cache folder for faster future calling.
A problem has arisen when the url in question is a rewrite.
For example, in a general situation http://someurl.com/dir
will return the first image found on the page. Lets say in the source that is src="img/image.jpg"
. I then save a copy of http://someurl.com/dir/img/image.jpg
to my local directory.
The problem comes in when dir
is actually a rewrite. Say http://someurl.com/dir
is actually http://someurl.com/?username=dir
. In such an instance http://someurl.com/dir/img/image.jpg
will return a 404 error because the file is actually located at http://someurl.com/img/image.jpg
.
So I am wondering how I could actually check if the url is a rewrite and how I would handle this.
Keep in mind this is a simple example. Its possible the url is http://someurl.com/dir/user/post/type/
and that dir
, user
, post
, and type
is all just one giant rewrite and src="img/image.jpg"
is still actually the base path /
.
Sites like facebook seem to be able to handle such rewrites fairly easily and still get the image and I am totally baffled how its done.