将今天的“历史上的这一天”写成PHP中的数组

I'm trying to get the four or five things that happened on this day in history, and add a plaintext representation of that into an array in PHP.

So far, I'm using this code:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://en.wikipedia.org/w/api.php?action=featuredfeed&feed=onthisday&feedformat=rss');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
curl_setopt($ch, CURLOPT_USERAGENT, 'My random user agent'); // Needed for Wikipedia to prevent IP blocking
$contents = trim(curl_exec($ch));
curl_close($ch);

$xml = simplexml_load_string($contents);
$json = json_encode($xml);
$array = json_decode($json, true);


$noOfDays = count($array['channel']['item']);
$r = $noOfDays - 1;
$input = $array['channel']['item'][$r]['description'];

I know this is not very dyamic and efficient, but one person is going to be calling this page once a day, so it's not terribly important.

At this point, $input contains a block of HTML, which looks something like this:

<p><b><a href="/wiki/April_6" title="April 6">April 6</a></b>: <b><a href="/wiki/Good_Friday" title="Good Friday">Good Friday</a></b> (Western Christianity, 2012); <b><a href="/wiki/Fast_of_the_Firstborn" title="Fast of the Firstborn">Fast of the Firstborn</a></b> begins at dawn and <b><a href="/wiki/Passover" title="Passover">Passover</a></b> begins at sunset (Judaism, 2012)
</p>
<div style="float:right;margin-left:0.5em">
<p><a href="/wiki/File:Sir_Arthur_Wellesley,_1st_Duke_of_Wellington.png" class="image" title="Arthur Wellesley, the Earl of Wellington"><img alt="Arthur Wellesley, the Earl of Wellington" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/83/Sir_Arthur_Wellesley%2C_1st_Duke_of_Wellington.png/78px-Sir_Arthur_Wellesley%2C_1st_Duke_of_Wellington.png" width="78" height="100" /></a>
</p>
</div>
<li style="-moz-float-edge: content-box">
<a href="/wiki/1250" title="1250">1250</a> – <a href="/wiki/Seventh_Crusade" title="Seventh Crusade">Seventh Crusade</a>: Egyptian <a href="/wiki/Ayyubid" title="Ayyubid" class="mw-redirect">Ayyubids</a> <b><a href="/wiki/Battle_of_Fariskur" title="Battle of Fariskur">annihilated the crusader army</a></b> and captured King <a href="/wiki/Louis_IX_of_France" title="Louis IX of France">Louis&#160;IX of France</a> as a hostage.
<li style="-moz-float-edge: content-box">
<a href="/wiki/1320" title="1320">1320</a> – The <b><a href="/wiki/Declaration_of_Arbroath" title="Declaration of Arbroath">Declaration of Arbroath</a></b>, a declaration of <a href="/wiki/Scottish_independence" title="Scottish independence">Scottish independence</a>, was adopted.
<li style="-moz-float-edge: content-box">
<a href="/wiki/1812" title="1812">1812</a> – <a href="/wiki/Peninsular_War" title="Peninsular War">Peninsular War</a>: After a <b><a href="/wiki/Siege_of_Badajoz_(1812)" title="Siege of Badajoz (1812)">three-week siege</a></b>, the <a href="/wiki/Anglo-Portuguese_Army" title="Anglo-Portuguese Army">Anglo-Portuguese Army</a>, under the <a href="/wiki/Arthur_Wellesley,_1st_Duke_of_Wellington" title="Arthur Wellesley, 1st Duke of Wellington">Earl of Wellington</a> <i>(pictured)</i>, captured <a href="/wiki/Badajoz" title="Badajoz">Badajoz</a>, Spain and forced the surrender of the French garrison.
<li style="-moz-float-edge: content-box">
<a href="/wiki/1947" title="1947">1947</a> – The <a href="/wiki/1st_Tony_Awards" title="1st Tony Awards">first</a> <b><a href="/wiki/Tony_Award" title="Tony Award">Tony Awards</a></b>, recognizing achievement in live American <a href="/wiki/Theatre" title="Theatre">theatre</a>, were handed out at the <a href="/wiki/Waldorf-Astoria_Hotel" title="Waldorf-Astoria Hotel">Waldorf-Astoria Hotel</a> in <a href="/wiki/New_York_City" title="New York City">New York City</a>.
<li style="-moz-float-edge: content-box">
<a href="/wiki/2008" title="2008">2008</a> – Egyptian workers staged <b><a href="/wiki/2008_Egyptian_general_strike" title="2008 Egyptian general strike">an illegal general strike</a></b>, two days before <a href="/wiki/Egyptian_municipal_elections,_2008" title="Egyptian municipal elections, 2008">key municipal elections</a>.
</li>
</ul>
<p>More anniversaries: <span class="nowrap"><a href="/wiki/April_5" title="April 5">April 5</a> &#8211;</span> <span class="nowrap"><b><a href="/wiki/April_6" title="April 6">April 6</a></b> &#8211;</span> <span class="nowrap"><a href="/wiki/April_7" title="April 7">April 7</a></span>
</p>
<div style="text-align: right;" class="noprint"><span class="nowrap"><b><a href="/wiki/Wikipedia:Selected_anniversaries/April" title="Wikipedia:Selected anniversaries/April">Archive</a></b> &#8211;</span> <span class="nowrap"><b><a href="https://lists.wikimedia.org/mailman/listinfo/daily-article-l" class="extiw" title="mail:daily-article-l">By email</a></b> &#8211;</span> <span class="nowrap"><b><a href="/wiki/List_of_historical_anniversaries" title="List of historical anniversaries">List of historical anniversaries</a></b></span></div>
<div style="text-align: right;"><small>It is now <span class="nowrap">April 6, 2012</span> (<a href="/wiki/Coordinated_Universal_Time" title="Coordinated Universal Time">UTC</a>) &#8211; <span class="plainlinks" id="purgelink"><span class="nowrap"><a class="external text" href="//en.wikipedia.org/w/index.php?title=MediaWiki:Ffeed-onthisday-transcludeme&amp;action=purge">Refresh this page</a></span></span></small></div>

The only thing that I'm interested in are the bits between each <li style="-moz-float-edge: content-box">

I've got no idea why they didn't close these <li> tags properly, but there you go.

So the essence of what I want to is take the actual information, strip away the links and add each one into an array, which should look something like this:

Array (
    [0] => 1250 – Seventh Crusade: Egyptian Ayyubids annihilated the crusader army and captured King Louis&#160;IX of France as a hostage.
    [1] => Next one...
    [2] => And another...
)

There's also a slight problem regarding the   at the end of this line. How would I translate that into plaintext? I have a feeling HTML parsing may be the answer.

I've already tried regex and HTML parsing, but as the tags don't close I've had some difficulty doing this.

Any suggestions?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongyakui8675 2012-04-06 16:08
关注
As @zzzzBov points out, closing tags are optional in HTML (but not XHTML). Unfortunately this is one of several facts that makes it incompatible with XML (and XML parsers). For your task I would recommend parsing the DOM using a library like phpQuery or PHP Simple HTML DOM Parser.

In phpQuery your code would look something like this:

$doc = phpQuery::newDocumentHTML( $input ); $items = $doc->find('li'); foreach($items as $item) { echo pq($item)->text(); } // Or... (PHP 5.3+) $items = array_map( function( $item ) { return pq( $item )->text(); }, $doc->find('li') );

As for  , try html_entity_decode().
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何将两个输入组合成一个数组php php
2018-05-24 18:08

回答 1 已采纳 As @UlyssesMarx stated in the comments, you should modify the structure of your HTML input fields
php 如何把索引数组转换成关联数组？ php
2021-09-08 18:12

回答 2 已采纳 <?php $a = [1,2,3]; $b = [4,5,6]; $c = array_combine($a,$b); print_r($c); ?>
如何将html表单中的数组发送到php html5 php
2019-04-20 12:57

回答 2 已采纳 Input file elements never store in the $_POST array. You have to loop through the $_FILES array to
php把数组转换成对象,php怎么将数组转换成对象
2021-03-23 19:08

leepharos的博客 echoecho() 函数输出一个或多个字符...print2021-02-23 21:39:53php 数组转换成对象的方法方法一：强制类型转换----在要转换的变量之前加上用括号括起来的目标类型允许转换的PHP数据类型有：(int)、(integer)：转换...
PHP在数组的关联数组中转换数组数组 php
2019-07-22 14:47

回答 2 已采纳 You can try the below code. It will make header name and value of that dynamic, based on csv file.
PHP array_push将一个数组转换成另一个数组 php
2017-07-19 10:11

回答 3 已采纳 Correct way to do with array_push():- array_push($years, array("2016" => array())); But what
PHP将二维数组转成一维数组，不知道哪里出错了 php
2021-05-03 21:09

回答 2 已采纳你截图就是一维数组啊？不是符合题目说的了？我这里运行后结果也是 Array ( [0] => 67 [1] => 2 [2] => 86 [3] => 1 [4] =>
php函数从数组中取出指定的数目,PHP数组函数
2021-03-24 09:10

清草子的博客 1.array_rand()从数组中随机取出一个或多个元素(返回值是：随机元素的键)$arr=['js','css',25,'php',30];printf('%s',print_r($arr,true));//随机去2个元素$res=array_rand($arr,2);printf('%s',print_r($res,true))...
在PHP中输出一个数组，在数组中输出数组 php
2018-06-26 14:45

回答 3 已采纳 Just need to assign $shopping["john"] to an array. $shopping["john"] = array("notebook1", "notebo
将php数组拆分成团队 php
2018-05-04 15:14

回答 2 已采纳 You could use array chunk http://php.net/manual/en/function.array-chunk.php $input_array = array(
怎么使用php随机获取文件内的一个数组数据？ php 前端后端
2021-11-25 12:57

回答 2 已采纳用 srand() 或 mt_srand()产生随机数，然后根据这个随机数取值就行，或者用array_rand()这个函数
json php 数组读写_PHP如何读取json文件，并解析成数组
2020-12-21 11:49

weixin_39710106的博客 PHP如何读取json文件，并解析成数组马富天2016-07-10 16:32:21355【摘要】平时我调用json格式都是使用ajax去访问接口的，也没有想到直接使用PHP去处理json文件，下面总结一下PHP处理json文件的两种方法。首先要读取...
在php中将索引数组转换为多维关联/索引数组 php
2018-11-09 14:35

回答 3 已采纳 This is a blanket statement on how to get your desired array : $desired_array = array(array("0"=&
数组值转化为字符串php,php数组转化为字符串
2021-05-07 09:27

moodlab的博客 1.函数explode(); 这个是字符串转化为数组 ,implode() ;这个是数组转化为字符串。$array=explode(separator,$string);...当把一个数组转换成一个字符串时，将会设置胶合符——将被插入到生成字符串中的数组值之...
php 数组 indexof,详解js中字符串和数组的indexof方法
2021-04-26 14:12

一天到晚学习的博客 javascript:里判断字符串是否包涵某个子字符串时，我们经常会遇到indexOf这个方法。但是你可知道，indexOf不仅仅可以用在字符串里，还可以用在数组里。首先我们来认识一下indexOf方法定义indexOf() 方法可返回某个...
没有解决我的问题, 去提问

悬赏问题

¥30 这是哪个作者做的宝宝起名网站
¥60 版本过低apk如何修改可以兼容新的安卓系统
¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
¥50 有数据，怎么建立模型求影响全要素生产率的因素
¥50 有数据，怎么用matlab求全要素生产率
¥15 TI的insta-spin例程
¥15 完成下列问题完成下列问题
¥15 C#算法问题, 不知道怎么处理这个数据的转换
¥15 YoloV5 第三方库的版本对照问题
¥15 请完成下列相关问题！

将今天的“历史上的这一天”写成PHP中的数组

1条回答 默认 最新

悬赏问题

1条回答默认最新