doushan2224 2011-12-20 08:48
浏览 46
已采纳

获取外部网页图像的绝对路径

I am working on bookmarklet and I am fetching all the photos of any external page using HTML DOM parser(As suggested earlier by SO answer). I am fetching the photos correctly and displaying that in my bookmarklet pop up. But I am having problem with the relative path of photos.

for example the photo source on external page say http://www.example.com/dir/index.php

  1. photo Source 1 : img source='hostname/photos/photo.jpg' - Getting photo as it is absolute

  2. photo Source 2 : img source='/photos/photo.jpg' - not getting as it is not absolute.

I worked through the current url I mean using dirname or pathinfo for getting directory by current url. but causes problem between host/dir/ (gives host as parent directory ) and host/dir/index.php (host/dir as parent directory which is correct)

Please help How can I get these relative photos ??

  • 写回答

2条回答 默认 最新

  • dsbx40787736 2011-12-20 09:23
    关注

    FIXED (added support for query-string only image paths)

    function make_absolute_path ($baseUrl, $relativePath) {
    
        // Parse URLs, return FALSE on failure
        if ((!$baseParts = parse_url($baseUrl)) || (!$pathParts = parse_url($relativePath))) {
            return FALSE;
        }
    
        // Work-around for pre- 5.4.7 bug in parse_url() for relative protocols
        if (empty($baseParts['host']) && !empty($baseParts['path']) && substr($baseParts['path'], 0, 2) === '//') {
            $parts = explode('/', ltrim($baseParts['path'], '/'));
            $baseParts['host'] = array_shift($parts);
            $baseParts['path'] = '/'.implode('/', $parts);
        }
        if (empty($pathParts['host']) && !empty($pathParts['path']) && substr($pathParts['path'], 0, 2) === '//') {
            $parts = explode('/', ltrim($pathParts['path'], '/'));
            $pathParts['host'] = array_shift($parts);
            $pathParts['path'] = '/'.implode('/', $parts);
        }
    
        // Relative path has a host component, just return it
        if (!empty($pathParts['host'])) {
            return $relativePath;
        }
    
        // Normalise base URL (fill in missing info)
        // If base URL doesn't have a host component return error
        if (empty($baseParts['host'])) {
            return FALSE;
        }
        if (empty($baseParts['path'])) {
            $baseParts['path'] = '/';
        }
        if (empty($baseParts['scheme'])) {
            $baseParts['scheme'] = 'http';
        }
    
        // Start constructing return value
        $result = $baseParts['scheme'].'://';
    
        // Add username/password if any
        if (!empty($baseParts['user'])) {
            $result .= $baseParts['user'];
            if (!empty($baseParts['pass'])) {
                $result .= ":{$baseParts['pass']}";
            }
            $result .= '@';
        }
    
        // Add host/port
        $result .= !empty($baseParts['port']) ? "{$baseParts['host']}:{$baseParts['port']}" : $baseParts['host'];
    
        // Inspect relative path path
        if ($relativePath[0] === '/') {
    
            // Leading / means from root
            $result .= $relativePath;
    
        } else if ($relativePath[0] === '?') {
    
            // Leading ? means query the existing URL
            $result .= $baseParts['path'].$relativePath;
    
        } else {
    
            // Get the current working directory
            $resultPath = rtrim(substr($baseParts['path'], -1) === '/' ? trim($baseParts['path']) : str_replace('\\', '/', dirname(trim($baseParts['path']))), '/');
    
            // Split the image path into components and loop them
            foreach (explode('/', $relativePath) as $pathComponent) {
                switch ($pathComponent) {
                    case '': case '.':
                        // a single dot means "this directory" and can be skipped
                        // an empty space is a mistake on somebodies part, and can also be skipped
                        break;
                    case '..':
                         // a double dot means "up a directory"
                        $resultPath = rtrim(str_replace('\\', '/', dirname($resultPath)), '/');
                        break;
                    default:
                        // anything else can be added to the path
                        $resultPath .= "/$pathComponent";
                        break;
                }
            }
    
            // Add path to result
            $result .= $resultPath;
    
        }
    
        return $result;
    
    }
    

    Tests:

    echo make_absolute_path('http://www.example.com/dir/index.php','/photos/photo.jpg')."
    ";
    // Outputs: http://www.example.com/photos/photo.jpg
    echo make_absolute_path('http://www.example.com/dir/index.php','photos/photo.jpg')."
    ";
    // Outputs: http://www.example.com/dir/photos/photo.jpg
    echo make_absolute_path('http://www.example.com/dir/index.php','./photos/photo.jpg')."
    ";
    // Outputs: http://www.example.com/dir/photos/photo.jpg
    echo make_absolute_path('http://www.example.com/dir/index.php','../photos/photo.jpg')."
    ";
    // Outputs: http://www.example.com/photos/photo.jpg
    echo make_absolute_path('http://www.example.com/dir/index.php','http://www.yyy.com/photos/photo.jpg')."
    ";
    // Outputs: http://www.yyy.com/photos/photo.jpg
    echo make_absolute_path('http://www.example.com/dir/index.php','?query=something')."
    ";
    // Outputs: http://www.example.com/dir/index.php?query=something
    

    I think that should deal with just about everything your likely to encounter correctly, and should equate to roughly the logic used by a browser. Also should correct any oddities you might get on Windows with stray forward slashes from using dirname().

    First argument is the full URL of the page where you found the <img> (or <a> or whatever) and second argument is the contents of the src/href etc attribute.

    If anyone finds something that doesn't work (cos I know you'll all be trying to break it :-D), let me know and I'll try and fix it.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 关于大棚监测的pcb板设计
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器
  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用