dqpfkzu360216 2012-09-07 15:14
浏览 63

需要解析链接的HTML文档 - 使用像html5lib或其他类似的库?

I'm a very newbie webpage builder, currently working on creating a website that needs to change link colours according to the destination page. The links will be sorted into different classes (e.g. good, bad, neutral) by certain user input criteria-- e.g. links with content the user would find of interest is colored blue, stuff that the user (presumably) doesn't want to see is colored as normal text, etc.

I reckon I need a way to parse the webpage for links to the content (stored in MySQL database), change the colors for all the links on the page (so I need to be able to change the link classes in the HTML as well) before outputting the adapted page to the user. I read that regex is not a good way to find those links-- so should I use a library, and if so, is html5lib good for what I'm doing?

  • 写回答

1条回答 默认 最新

  • duancai7568 2012-09-07 15:31
    关注

    There's no need to complicate urself with PHP HTML parsers which will mangle and forcefully "repair" your input HTML.

    Here's how you can combine PHP with javascript, complete working and tested solution:

    <?php
    $arrBadLinks=array(
        "http://localhost/something.png",
        "https://www.apple.com/something.png",
    );
    $arrNeutralLinks=array(
        "http://www.microsoft.com/index.aspx",
        "ftp://samewebsiteasyours.com",
        "ftp://samewebsiteasyours.net/file.txt",
    );
    ?>
    <html>
        <head>
            <script>
            function colorizeLinks()
            {
                var arrBadLinks=<?php echo json_encode($arrBadLinks);?>;
                var arrNeutralLinks=<?php echo json_encode($arrNeutralLinks);?>;
    
                var nodeList=document.getElementsByTagName("*");
                for(var n=nodeList.length-1; n>0; n--)
                {
                    var el=nodeList[n];
    
                    if(el.nodeName=="A")
                    {
                        if(arrBadLinks.indexOf(el.href)>-1)
                            el.style.color="red";
                        else if(arrNeutralLinks.indexOf(el.href)>-1)
                            el.style.color="green";
                        else
                            el.style.color="blue";
                    }
                }
            }
    
            if(window.addEventListener)
                window.addEventListener("load", colorizeLinks, false);
            else if (window.attachEvent)
                window.attachEvent("onload", colorizeLinks);
            </script>
        </head>
        <body>
            <p>
                <a href="http://www.microsoft.com/index.aspx">Neutral www.microsoft.com/index.aspx</a>
            </p>
            <p>
                <a href="http://localhost/something.png">Bad http://localhost/something.png</a>
            </p>
        </body>
    </html>
    

    Does not work for relative URLs, make sure you make them absolute, or the comparison will fail (or update the code to fill in the http://current-domain.xxx for the existing relative URL).

    评论

报告相同问题?

悬赏问题

  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误