dqpfkzu360216 2012-09-07 15:14
浏览 63

需要解析链接的HTML文档 - 使用像html5lib或其他类似的库?

I'm a very newbie webpage builder, currently working on creating a website that needs to change link colours according to the destination page. The links will be sorted into different classes (e.g. good, bad, neutral) by certain user input criteria-- e.g. links with content the user would find of interest is colored blue, stuff that the user (presumably) doesn't want to see is colored as normal text, etc.

I reckon I need a way to parse the webpage for links to the content (stored in MySQL database), change the colors for all the links on the page (so I need to be able to change the link classes in the HTML as well) before outputting the adapted page to the user. I read that regex is not a good way to find those links-- so should I use a library, and if so, is html5lib good for what I'm doing?

  • 写回答

1条回答 默认 最新

  • duancai7568 2012-09-07 15:31
    关注

    There's no need to complicate urself with PHP HTML parsers which will mangle and forcefully "repair" your input HTML.

    Here's how you can combine PHP with javascript, complete working and tested solution:

    <?php
    $arrBadLinks=array(
        "http://localhost/something.png",
        "https://www.apple.com/something.png",
    );
    $arrNeutralLinks=array(
        "http://www.microsoft.com/index.aspx",
        "ftp://samewebsiteasyours.com",
        "ftp://samewebsiteasyours.net/file.txt",
    );
    ?>
    <html>
        <head>
            <script>
            function colorizeLinks()
            {
                var arrBadLinks=<?php echo json_encode($arrBadLinks);?>;
                var arrNeutralLinks=<?php echo json_encode($arrNeutralLinks);?>;
    
                var nodeList=document.getElementsByTagName("*");
                for(var n=nodeList.length-1; n>0; n--)
                {
                    var el=nodeList[n];
    
                    if(el.nodeName=="A")
                    {
                        if(arrBadLinks.indexOf(el.href)>-1)
                            el.style.color="red";
                        else if(arrNeutralLinks.indexOf(el.href)>-1)
                            el.style.color="green";
                        else
                            el.style.color="blue";
                    }
                }
            }
    
            if(window.addEventListener)
                window.addEventListener("load", colorizeLinks, false);
            else if (window.attachEvent)
                window.attachEvent("onload", colorizeLinks);
            </script>
        </head>
        <body>
            <p>
                <a href="http://www.microsoft.com/index.aspx">Neutral www.microsoft.com/index.aspx</a>
            </p>
            <p>
                <a href="http://localhost/something.png">Bad http://localhost/something.png</a>
            </p>
        </body>
    </html>
    

    Does not work for relative URLs, make sure you make them absolute, or the comparison will fail (or update the code to fill in the http://current-domain.xxx for the existing relative URL).

    评论

报告相同问题?

悬赏问题

  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类
  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思