too long

I want to parse different web pages so that I can form an inverted index. I want to read only the text, not the a tag elements,menu, etc. Is it possible to do this? Here is what I have so far:

 <?php
 $ch = curl_init("http://en.wikipedia.org/wiki/Agile_software_development");
 curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
 $c1 = curl_exec($ch);
 $dom = new DOMDocument();
 @$dom->loadHTML($c1);

 $links = $dom->getElementsByTagName("body");
 echo "<br>";

 foreach($links as $links) {
    $title = $links->getElementsBytagName("a");
    $l= $title->length;
    echo $link->nodeValue;
    echo"<br>";
 } ?>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douji6896 2015-01-16 12:15
关注
I would do it like this:

<?php $html = <<<HTML <html> <head> <title>TITLE</title> </head> <body> PARA 1 PARA 2 </body> </html> HTML; $dom = new DOMDocument(); @$dom->loadHtml($html); var_dump($dom->getElementsByTagName("body")[0]->textContent); ?>

The textContent field gives you the contents of the Node itself and of its descendants, in document order. The output of the above is:

string(25) " PARA 1 PARA 2 "

If you want to normalize the spaces (replace all sequences of 2 or more spaces with just one space and remove the leading and trailing spaces), then you can do this:

var_dump(preg_replace('/\s{2,}/', ' ', trim( $dom->getElementsByTagName("body")[0]->textContent)));
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

Angular 2前端，Golang后端 angular typescript
2016-10-06 14:54

回答 1 已采纳 Found the solution after much cursing. I used this very barebones starter for a webpack. It worked
ThinkPHP6部署后返回的数据变成了JSON字符串？ laravel php
2021-03-19 16:03

回答 5 已采纳自己调试一下，两个方法 1.在Response::create()之前直接exit或者return掉，看看是否有输出小红点，如果这时候有小红点，那么说明是上面有程序输出了小红点，这时候用var_
我将如何遍历此JSON对象？ json php
2016-11-13 02:42

回答 1 已采纳 comment to answer as its correct. to loop through this particular array structure its: foreach (
前端RSA加密，加密字符串过长，提示“Message too long for RSA”问题
2021-01-29 15:27

Marktubbu的博客 /** *长文本加密 *@param{string}string待加密长文本 *@returns{string}加密后的base64编码 */ RSAKey.prototype.encryptLong=function(text){ ...varmaxLength=((this.n.bitLength()+7)>......
如何将图像添加到电子邮件的html正文（Go） javascript
2016-12-13 04:10

回答 1 已采纳 First of all, check if the data in t.Test is correct (if contains the filename, path, or whatever
从多维数组中的第3级抓取所有值 json php
2017-06-07 13:52

回答 1 已采纳 Assuming you're taking your JSON, running it through json_decode($json) and then using foreach on
如何向PHP发送一个简短的Ajax请求，得到确认，不要等待完整的处理完成 ajax php
2015-07-29 13:16

回答 1 已采纳 Simple, don't show the spinner. It's asynchronous anyway Just don't show the spinner. The reque
git Filename too long解决方案
2019-05-01 16:01

想搞全栈的前端的博客问题 git clone代码时提示Filename too long，一般是在windows下出现的问题。解决方法用管理员打开命令窗口，输入git config --system core.longpaths true解决。
调试卡死程序 json mysql
2015-09-20 05:44

回答 1 已采纳 This was a issue with the driver. Fixed it by removing the defer call & checking the array bounds.
删除后隐藏评论 ajax html jquery php
2014-08-16 17:12

回答 1 已采纳 Change This code: <div class='holder' id_p='<?php echo $post_id; ?>'> to this: &lt
PHP函数将随机值存储到数据库中 html javascript jquery php
2015-03-10 02:53

回答 5 已采纳 After spending a lot of time I got the root point of the problem, It was a database error because
vue 、前端rsa加密遇到的问题，message too long for RSA
2019-11-20 09:39

浑浑噩噩撸代码的博客需求：对登录密码，修改密码过程进行加密，如RSA，3... RSA加密：公钥和私钥（非对称性加密） 3DES加密：密钥加密（对成型加密）第一步：安装jsencrypt npm i jsencrypt ...第二部：组件里引入 ...import { getKey...
单击单选按钮提交表单并将信息发送到另一个文件 html javascript jquery php
2014-02-04 10:36

回答 1 已采纳 See http://api.jquery.com/change/ & http://api.jquery.com/jquery.ajax/ For example: $('#myradio')
mysql specified key was too long_MySQL错误“Specified key was too long; max key length is 1000 bytes”的解...
2021-02-07 12:40

SO豹猫的博客近期要做前端这个问题研究了下仅仅须要两句话就能非常清楚的解释了(之前问的那些人是不是自己都没理解非常郁闷.) 《java入门第一季》之面向对象综合小案例需求: /* 教练和运动员案例乒乓球运动员和篮球运动员....
IDEA报错Error running ‘Application‘ Command line is too long解决方案
2022-07-27 16:27

geejkse_seff的博客 IDEA报错Error running ‘Application’: Command line is too long.Shorten command line for Application or also for Spring Boot default configuration 问题背景解决方案方案一（当前项目设置）方案二（全局...
jsencrypt加密，并解决Message too long for RSA
2022-10-20 18:03

鲤鱼的日常生活的博客解决Message too long for RSA报错在这里我们有两种选择方案，jsencrypt和 encryptlong 使用后者的原因是因为jsencrypt的加密长度有限制，不过encryptlong本身也是有bug，有事又会问题，加密后解析不出来，可能是...
web前端后端数据交互，RSA加密，提示“Message too long for RSA”问题，原因：加密字符串过长，
2019-10-23 16:46

腿毛1米5的欧巴的博客记录最近项目需要用到rsa加密遇到的问题 ---------- Message too long for RSA，以及解决方法。希望能帮到你们哦！背景我项目是用的npm里的jsencrypt插件进行加密的。加密代码如下： let encryptedData = ...
Data truncation: Data too long for column ‘xxx‘ at row xxx问题详解
2021-08-23 21:42

iheanu_的博客问题环境该问题出现在Servlet接受表单信息，系统报错。可能的原因及解决方案 ...找到from表单添加上对应的value值即可解决 ...2.method方法缘故：若提交的请求头信息过长，会超过get的范围，此时需要改为post方式 ...
Request URI Too Long
2015-09-24 16:36

???Sir的博客如上图所示，URL传參长度限制，改为Post参数提交就好了。
没有解决我的问题, 去提问

悬赏问题

¥15 安卓adb backup备份应用数据失败
¥15 eclipse运行项目时遇到的问题
¥15 关于#c##的问题：最近需要用CAT工具Trados进行一些开发
¥15 南大pa1 小游戏没有界面，并且报了如下错误，尝试过换显卡驱动，但是好像不行
¥15 没有证书，nginx怎么反向代理到只能接受https的公网网站
¥50 成都蓉城足球俱乐部小程序抢票
¥15 yolov7训练自己的数据集
¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)（相关搜索：51单片机|单片机|测试代码）
¥15 电力市场出清matlab yalmip kkt 双层优化问题
¥30 ros小车路径规划实现不了，如何解决？(操作系统-ubuntu)

too long

2条回答 默认 最新

悬赏问题

2条回答默认最新