php DOMDocument提取与锚点或alt的链接

I which to extract all the link include on page with anchor or alt attribute on image include in the links if this one come first.

$html = '<a href="lien.fr">Anchor</a>';

Must return "lien.fr;Anchor"

$html = '<a href="lien.fr"><img alt="Alt Anchor">Anchor</a>';

Must return "lien.fr;Alt Anchor"

$html = '<a href="lien.fr">Anchor<img alt="Alt Anchor"></a>';

Must return "lien.fr;Anchor"

I did:

$doc = new DOMDocument();
$doc->loadHTML($html);

$out = "";
$n = 0;
$links = $doc->getElementsByTagName('a');

foreach ($links as $element) {
    $href = $img_alt = $anchor = "";
    $href = $element->getAttribute('href');
    $n++;
    if (!strrpos($href, "panier?")) {

        if ($element->firstChild->nodeName == "img") {

            $imgs = $element->getElementsByTagName('img');

            foreach ($imgs as $img) {
                if ($anchor = $img->getAttribute('alt')) {
                    break;
                }
            }
        }

        if (($anchor == "") && ($element->nodeValue)) {
            $anchor = $element->nodeValue;
        }

        $out[$n]['link'] = $href;
        $out[$n]['anchor'] = $anchor;
    }
}

This seems to work but if there some space or indentation it doesn't as

$html = '<a href="link.fr">
                    <img src="ceinture-gris" alt="alt anchor"/>
                </a>';

the $element->firstChild->nodeName will be text

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dongluanjie8678 2016-11-20 15:13

关注

Something like this:

$doc = new DOMDocument();
$doc->loadHTML($html);

// Output texts that will later be joined with ';'
$out = [];
// Maximum number of items to add to $out
$max_out_items = 2;
// List of img tag attributes that will be parsed by the loop below
// (in the order specified in this array!)
$img_attributes = ['alt', 'src', 'title'];

$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
  if ($href = trim($element->getAttribute('href'))) {
    $out []= $href;
    if (count($out) >= $max_out_items)
      break;
  }

  foreach ($element->childNodes as $child) {
    if ($child->nodeType === XML_TEXT_NODE &&
      $text = trim($child->nodeValue))
    {
      $out []= $text;
      if (count($out) >= $max_out_items)
        break;
    } elseif ($child->nodeName == 'img') {
      foreach ($img_attributes as $attr_name) {
        if ($attr_value = trim($child->getAttribute($attr_name))) {
          $out []= $attr_value;
          if (count($out) >= $max_out_items)
            goto Result;
        }
      }
    }
  }
}

Result:
echo $out = implode(';', $out);

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

php DOMDocument提取与锚点或alt的链接 php
2016-11-20 14:44

回答 1 已采纳 Something like this: $doc = new DOMDocument(); $doc->loadHTML($html); // Output texts that wi
PHP DOMXPath在td中提取锚点的href php
2017-11-10 09:52

回答 1 已采纳 With the structure you posted, the following outputs the href-value: <?php $dom = new DOMDocum
PHP DOMDocument：从id获取属性值 php
2017-10-10 14:52

回答 2 已采纳 Try this : $data->getAttribute('value'); PHP: DomElement->getAttribute $attrs = array()
php笔记
2019-06-28 10:06

斯昂的博客 php简要札记了解php基本使用，是方便后期理解node的使用，而不是主要技能，前端开发不写这个了解php基本使用，是方便后期理解node的使用，而不是主要技能，前端开发不写这个了解php基本使用，是方便后期理解...
使用PHP DomDocument提取文本和图像src php
2014-08-22 15:18

回答 1 已采纳 You're using $t->nodeValue to obtain the content of a node. An <img> tag is empty, thus h
PHP使用DOMDocument和/或Regex从HTML中提取URL php
2018-09-26 15:31

回答 1 已采纳 I think you can use regex to fetch this value which will be easier. $txt = <<<TXT <ht
如何用php DOMDocument输出纯文本？ php
2015-10-13 16:44

回答 2 已采纳 The solution is actually very simple - strip_tags function. echo strip_tags(innerHTML($table->
前端全模块常见面试题与答案详解
2021-10-29 16:05

仙女不下凡的博客 ???? 作者主页：????... 如有不足与错误敬请指正！ ????一、HTML与CSS部分 1.HTML盒模型 2.行元素与块元素区别 3.flex弹性布局 4.h5新特征 5.css3新特征 6.px、em与rem区别 7.清除浮动 8.垂直居中
在PHP中为内容刮取DOMDocument表 php
2015-11-12 22:30

回答 1 已采纳 Can this be of any help? $table = $dom->getElementsByTagName('table')->item(1); foreach ($t
PHP DOMDocument XML html php xml
2018-10-04 11:55

回答 3 已采纳 Use DOMXpath to selecting elements with special condition. In your case, base on child id attribut
DOMDocument使用PHP通过标识符删除div及其内容 html php
2017-01-03 21:49

回答 1 已采纳 The use of getElementById requires a Document Type Declaration (DTD). PHP Documentation Notice y
JavaScript的相关知识与问题
2022-08-08 10:26

糖心何包蛋111的博客什么叫JavaScript 与网页进行交互的脚本语言,具有逻辑性 2.JavaScript的组成部分 ECMAScript(欧洲计算机制造商协会),用于指定脚本规范 DOM (文档对象模型) Browse Object Model BOM (浏览器对象模型) Document ...
用子节点PHP替换DOMDocument父节点 php
2017-04-11 05:58

回答 1 已采纳 Hello_ mate. If I understood you well you want to remove the <a> tags and I don't know what
前端面试八股文（详细版）—上
2022-11-13 17:06

旺旺大力包的博客 - bind 方法 bind()的作用与 call()和 apply()一样，都是可以改变函数运行时上下文，区别是 call()和 apply()在调用函数之后会立即执行，而 bind()方法调用并改变函数运行时上下文后，返回一个新的函数，供我们需要...
2024前端面试题总汇（持续更新中...）
2023-09-26 06:11

小菜猿_的博客它们区别是什么 ⭐ 8、标签上title与alt属性有什么区别？ 9、 H5新特性有哪些？⭐⭐ 10、css3的新特性有哪些？⭐⭐ 11、css的引用有哪些，link和@import的区别？⭐ 12、href和src的区别？⭐ 13、CSS常用尺寸单位有...
没有解决我的问题, 去提问

悬赏问题

¥15 求帮我调试一下freefem代码
¥15 R语言Rstudio突然无法启动
¥15 关于#matlab#的问题：提取2个图像的变量作为另外一个图像像元的移动量，计算新的位置创建新的图像并提取第二个图像的变量到新的图像
¥15 改算法，照着压缩包里边，参考其他代码封装的格式写到main函数里
¥15 用windows做服务的同志有吗
¥60 求一个简单的网页(标签-安全|关键词-上传)
¥35 lstm时间序列共享单车预测，loss值优化，参数优化算法
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图

码龄粉丝数原力等级 --

php DOMDocument提取与锚点或alt的链接

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

php DOMDocument提取与锚点或alt的链接

1条回答 默认 最新

悬赏问题

1条回答默认最新