duanlang1196 2019-01-24 22:47
浏览 62

如何在使用curl时检测重写的url

So I am using curl to get the first image listed on a page if no og data exists. I then save the image locally to a cache folder for faster future calling.

A problem has arisen when the url in question is a rewrite.

For example, in a general situation http://someurl.com/dir will return the first image found on the page. Lets say in the source that is src="img/image.jpg". I then save a copy of http://someurl.com/dir/img/image.jpg to my local directory.

The problem comes in when dir is actually a rewrite. Say http://someurl.com/dir is actually http://someurl.com/?username=dir. In such an instance http://someurl.com/dir/img/image.jpg will return a 404 error because the file is actually located at http://someurl.com/img/image.jpg.

So I am wondering how I could actually check if the url is a rewrite and how I would handle this.

Keep in mind this is a simple example. Its possible the url is http://someurl.com/dir/user/post/type/ and that dir, user, post, and type is all just one giant rewrite and src="img/image.jpg" is still actually the base path /.

Sites like facebook seem to be able to handle such rewrites fairly easily and still get the image and I am totally baffled how its done.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 matlab(相关搜索:紧聚焦)
    • ¥15 基于51单片机的厨房煤气泄露检测报警系统设计
    • ¥15 路易威登官网 里边的参数逆向
    • ¥15 Arduino无法同时连接多个hx711模块,如何解决?
    • ¥50 需求一个up主付费课程
    • ¥20 模型在y分布之外的数据上预测能力不好如何解决
    • ¥15 processing提取音乐节奏
    • ¥15 gg加速器加速游戏时,提示不是x86架构
    • ¥15 python按要求编写程序
    • ¥15 Python输入字符串转化为列表排序具体见图,严格按照输入