doushi2902 2014-10-01 14:45
浏览 44

检查和提取CSS,JS和IMAGE资源的绝对URL

I need to extract Absolute URLs from source code. Now, here is the problem, i am extracting URLs for following:

>img tag SRC
>Script tag SRC (JS)
>CSS links

I'm using three different functions for each. The thing is that i sometimes get relative URLs, which are of no value since i have to further process them. Kindly review the following three functions and suggest improvements and corrections for how i can convert URLs to Absolute (after checking if they are not absolute already, of course).

thank you!

Function for extracting Image SRC.

function get_images(){
$images=array();
$regex='/[^(<!--)]<img [^>]*src=["|\']([^"|\']+(jpg|png|gif|jpeg))/i';
preg_match_all($regex, $this->source_code, $matches);
foreach ($matches[1] as $key=>$value) {
    $images[$key]=$value;
    }
    return $images;
}

Function for extracting JS links

function get_scripts(){
$script_links=array();
$regex='/<script [^>]*src=["|\']([^"|\']+(\.js))/i';
preg_match_all($regex, $this->source_code, $matches);
foreach ($matches[1] as $key=>$value) {
    $script_links[$key]=$value;
    }
    return $script_links;
}

Function for extracting CSS stylesheet links

function get_css(){
$css_links=array();
$regex='/<link [^>]*href=["|\']([^"|\']+(\.css))/i';
preg_match_all($regex, $this->source_code, $matches);
foreach ($matches[1] as $key=>$value) {
    $css_links[$key]=$value;
    }
    return $css_links;
}

Output i get when i use it on Google.com's source:

Array ( [0] => /images/icons/product/chrome-48.png [1] => http://www.google.com/images/hpp/pyramids-35.png ) 

Now this first link starts with /images/.... and is not reusable. This is the problem i'm trying to fix for all 3 types of sources.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 微信会员卡接入微信支付商户号收款
    • ¥15 如何获取烟草零售终端数据
    • ¥15 数学建模招标中位数问题
    • ¥15 phython路径名过长报错 不知道什么问题
    • ¥15 深度学习中模型转换该怎么实现
    • ¥15 HLs设计手写数字识别程序编译通不过
    • ¥15 Stata外部命令安装问题求帮助!
    • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
    • ¥15 TYPCE母转母,插入认方向
    • ¥15 如何用python向钉钉机器人发送可以放大的图片?