drgzmmy6379 2017-04-05 12:51
浏览 185
已采纳

从mysql数据库字符串字段中提取所有文件链接URL到列表

I need get a list of all file url in one of my database field.

mysql database, article table

`id` | `subject` | `content`

the value of content is html text with one or more file url, for example:

<p>this is the answer for ..., you can refer to below screenshot:</p>
<img src="http://the_url_of_image_here/imagename.jpg/>

<p>or refer to below document</p>

<a href="http://the_url_of_doc_here/guide.ppt>guide</a>
<a href="http://the_url_of_doc_here/sample.dox>sample</a>

there are 2 types of files

  1. image,with extension jpg,jpeg,png,bmp,gif
  2. document, with extension doc,docx,ppt,pptx,xls,xlsx,pdf,xps

I did a lot goolge, look like it's hard to do it only with mysql, php would make it easy, I write my codes but it can not work.

Thanks cars10, I solved it.

function export_articles_link()
{
    global $date_from, $date_to;
    $filename = "kb_articles_link_".$date_from."_".$date_to.".xlsx";
    header('Content-disposition: attachment;        filename="'.XLSXWriter::sanitize_filename($filename).'"');
    header("Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
    header('Content-Transfer-Encoding: binary');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    $query = 'SELECT `content` FROM `kb_articles` WHERE ((DATE(`dt`) BETWEEN \'' . $date_from . '\' AND \'' . $date_to . '\') AND (`content` LIKE \'%<img src=%\' or `content` LIKE \'%<a href="http:%\')) order by id asc';
    $result = mysql_query($query);
    $writer = new XLSXWriter(); 
    $img_list = array();
    while ($row=mysql_fetch_array($result))
    {
        $text = $row['content'];
        preg_match_all('!http://.+\.(?:jpe?g|png|gif|ppt?|xls?|doc?|pdf|xdw)!Ui', $text, $matches);
        $img_list = $matches[0];
        foreach ($img_list as $url)
        {
        $writer->writeSheetRow('Sheet1', array($url)); // if more than one url it will be put on first column
        }
    };
    $writer->writeToStdOut();
    exit(0);
}

share with others who need a work sample,hope it save your time.

  • 写回答

1条回答 默认 最新

  • dongwei3866 2017-04-05 13:45
    关注

    You should change your central loop to something like

    $image_list=array(); // prepare an empty array for collection
    while ($row=mysql_fetch_array($result))
    {
        $text = $row['content'];
        preg_match_all('!http://.+?\.(?:jpe?g|png|gif|pptx?|xlsx?|docx?|pdf|xdw)!i', $s, $matches);
        $img_list=array_merge($image_list,$matches[0]);  // append to array       
    }
    $writer->writeSheetRow('Sheet1', $image_list);
    

    Since you did not clearly specify what was wrong I just guessed and went ahead: The regular expression is slightly different from your original and also the way I structured the loop (yes, only one is needed). preg_match_all only needs to be called only once for each $text and then you merge the results from $matches[0] into your $img_list-array.

    I also removed your U-modifier, which was inverting the "greediness" of the whole regexp. Instead I added a ? after the + to make this one quantifier "non-greedy".

    I prepared a little minimalistic demo here: http://rextester.com/JDVMS87065

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题