dongtang5057 2011-02-11 20:17
浏览 73
已采纳

PHP:正则表达式搜索文件中的模式并将其拾取

I am really confused with regular expressions for PHP.

Anyway, I cant read the whole tutorial thing now because I have a bunch of files in html which I have to find links in there ASAP. I came up with the idea to automate it with a php code which it is the language I know.

so I think I can user this script :

$address = "file.txt"; 
$input = @file_get_contents($address) or die("Could not access file: $address");
$regexp = "??????????"; 
if(preg_match_all("/$regexp/siU", $input, $matches)) { 
    // $matches[2] = array of link addresses 
   // $matches[3] = array of link text - including HTML code 
} 

My problem is with $regexp

My required pattern is like this:

href="/content/r807215r37l86637/fulltext.pdf" title="Download PDF

I want to search and get the /content/r807215r37l86637/fulltext.pdf from above lines which I have many of them in the files.

any help?

==================

edit

title attributes are important for me and all of them which I want, are titled

title="Download PDF"

  • 写回答

5条回答 默认 最新

  • doubu1964 2011-02-11 20:25
    关注

    Once again regexp are bad for parsing html.

    Save your sanity and use the built in DOM libraries.

    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $x = new DOMXPath($dom);
        $data = array();
    foreach($x->query("//a[@title='Download PDF']") as $node)
    {
        $data[] = $node->getAttribute("href");
    }
    

    Edit Updated code based on ircmaxell comment.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?