doushi2902 2014-10-01 14:45
浏览 44

检查和提取CSS,JS和IMAGE资源的绝对URL

I need to extract Absolute URLs from source code. Now, here is the problem, i am extracting URLs for following:

>img tag SRC
>Script tag SRC (JS)
>CSS links

I'm using three different functions for each. The thing is that i sometimes get relative URLs, which are of no value since i have to further process them. Kindly review the following three functions and suggest improvements and corrections for how i can convert URLs to Absolute (after checking if they are not absolute already, of course).

thank you!

Function for extracting Image SRC.

function get_images(){
$images=array();
$regex='/[^(<!--)]<img [^>]*src=["|\']([^"|\']+(jpg|png|gif|jpeg))/i';
preg_match_all($regex, $this->source_code, $matches);
foreach ($matches[1] as $key=>$value) {
    $images[$key]=$value;
    }
    return $images;
}

Function for extracting JS links

function get_scripts(){
$script_links=array();
$regex='/<script [^>]*src=["|\']([^"|\']+(\.js))/i';
preg_match_all($regex, $this->source_code, $matches);
foreach ($matches[1] as $key=>$value) {
    $script_links[$key]=$value;
    }
    return $script_links;
}

Function for extracting CSS stylesheet links

function get_css(){
$css_links=array();
$regex='/<link [^>]*href=["|\']([^"|\']+(\.css))/i';
preg_match_all($regex, $this->source_code, $matches);
foreach ($matches[1] as $key=>$value) {
    $css_links[$key]=$value;
    }
    return $css_links;
}

Output i get when i use it on Google.com's source:

Array ( [0] => /images/icons/product/chrome-48.png [1] => http://www.google.com/images/hpp/pyramids-35.png ) 

Now this first link starts with /images/.... and is not reusable. This is the problem i'm trying to fix for all 3 types of sources.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥35 平滑拟合曲线该如何生成
    • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
    • ¥15 名为“Product”的列已属于此 DataTable
    • ¥15 安卓adb backup备份应用数据失败
    • ¥15 eclipse运行项目时遇到的问题
    • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
    • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
    • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
    • ¥50 成都蓉城足球俱乐部小程序抢票
    • ¥15 yolov7训练自己的数据集