从mysql数据库字符串字段中提取所有文件链接URL到列表

I need get a list of all file url in one of my database field.

mysql database, article table

`id` | `subject` | `content`

the value of content is html text with one or more file url, for example:

<p>this is the answer for ..., you can refer to below screenshot:</p>
<img src="http://the_url_of_image_here/imagename.jpg/>

<p>or refer to below document</p>

<a href="http://the_url_of_doc_here/guide.ppt>guide</a>
<a href="http://the_url_of_doc_here/sample.dox>sample</a>

there are 2 types of files

image,with extension jpg,jpeg,png,bmp,gif
document, with extension doc,docx,ppt,pptx,xls,xlsx,pdf,xps

I did a lot goolge, look like it's hard to do it only with mysql, php would make it easy, I write my codes but it can not work.

Thanks cars10, I solved it.

function export_articles_link()
{
    global $date_from, $date_to;
    $filename = "kb_articles_link_".$date_from."_".$date_to.".xlsx";
    header('Content-disposition: attachment;        filename="'.XLSXWriter::sanitize_filename($filename).'"');
    header("Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
    header('Content-Transfer-Encoding: binary');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    $query = 'SELECT `content` FROM `kb_articles` WHERE ((DATE(`dt`) BETWEEN \'' . $date_from . '\' AND \'' . $date_to . '\') AND (`content` LIKE \'%<img src=%\' or `content` LIKE \'%<a href="http:%\')) order by id asc';
    $result = mysql_query($query);
    $writer = new XLSXWriter(); 
    $img_list = array();
    while ($row=mysql_fetch_array($result))
    {
        $text = $row['content'];
        preg_match_all('!http://.+\.(?:jpe?g|png|gif|ppt?|xls?|doc?|pdf|xdw)!Ui', $text, $matches);
        $img_list = $matches[0];
        foreach ($img_list as $url)
        {
        $writer->writeSheetRow('Sheet1', array($url)); // if more than one url it will be put on first column
        }
    };
    $writer->writeToStdOut();
    exit(0);
}

share with others who need a work sample,hope it save your time.

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongwei3866 2017-04-05 13:45
关注
You should change your central loop to something like

$image_list=array(); // prepare an empty array for collection while ($row=mysql_fetch_array($result)) { $text = $row['content']; preg_match_all('!http://.+?\.(?:jpe?g|png|gif|pptx?|xlsx?|docx?|pdf|xdw)!i', $s, $matches); $img_list=array_merge($image_list,$matches[0]); // append to array } $writer->writeSheetRow('Sheet1', $image_list);

Since you did not clearly specify what was wrong I just guessed and went ahead: The regular expression is slightly different from your original and also the way I structured the loop (yes, only one is needed). preg_match_all only needs to be called only once for each $text and then you merge the results from $matches[0] into your $img_list-array.

I also removed your U-modifier, which was inverting the "greediness" of the whole regexp. Instead I added a ? after the + to make this one quantifier "non-greedy".

I prepared a little minimalistic demo here: http://rextester.com/JDVMS87065
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

大数据揭秘：数据库连接字符串错误排查指南
2025-05-16 14:42

百态老人的博客在数据库故障排查中，连接字符串配置错误是常见问题之一。通过大数据分析，发现此类错误多集中在服务器地址、认证信息和数据库名称等参数上，且常发生于多环境部署时。现代数据库默认启用加密，若未正确配置，可能...
高级DBA带你处理Mysql数据库10亿大数据条件下迁移实战
2024-02-01 17:29

菩提码修千手键的博客 FEDERATED存储引擎能让你访问远程的MySQL数据库而不使用replication或cluster技术(类似于Oracle的dblink),使用FEDERATED存储引擎的表,本地只存储表的结构信息,数据都存放在远程数据库上,查询时通过建表时指定的连接...
大数据系列之：Flink Doris Connector，实时同步数据到Doris数据库
2024-08-13 22:49

快乐骑行^_^的博客可以通过Flink操作（读取、插入、修改、删除）支持存储在Doris中的数据。本文介绍了如何通过Datastream和Flink操作Doris。
大数据CSV导入MySQL
2025-06-08 15:52

techzhi的博客一个高性能的CSV文件导入MySQL数据库的Spring Boot工具，支持自动类型推断、动态建表、流式处理和批量导入。
mysql截取最后一个特定字符
2024-08-02 04:14

Chan WatLaam的博客我整理的一些关于【MySQL,SQL】的项目学习资料（附讲解～～）和大家一起分享、...在数据处理的过程中，我们经常需要对数据进行清洗与转换，尤其是在大数据时代，数据的格式和完整性直接影响到后续的分析和挖掘。M...
【大数据面试】MySQL面试题与答案
2023-12-20 17:36

话数Science的博客 数据库中的事务是什么，MySQL中是怎么实现的 MySQL事务的特性? 数据库事务的隔离级别?解决了什么问题?默认事务隔离级别? 脏读，幻读，不可重复读的定义 MySQL怎么实现可重复读? 数据库第三范式和第四范式区别? ...
数据库连接超时异常在大数据场景下的原因分析和解决方案
2025-05-16 14:58

百态老人的博客在大数据场景下，合理配置和使用数据库连接池是提升系统性能和稳定性的重要手段。
[技术资料]MySQL数据库万字详解：一个开发者的必读指南
2024-11-28 15:15

Doug.的博客 [属性] [索引] [注释],`字段名` 列类型 [属性] [索引] [注释],......`字段名` 列类型 [属性] [索引] [注释],PRIMARY KEY (`字段名`))[表类型][字符集设置]聚合函数又叫组函数，通常是对表中的数据进行统计和计算，...
大数据解读数据库跨版本恢复兼容性
2025-05-16 12:45

百态老人的博客跨版本恢复数据库时，常面临数据格式不兼容、功能限制、工具链版本差异及参数配置冲突等问题。为解决这些问题，可以采用大数据驱动的思路，如自动化兼容性检测、分布式并行恢复和构建版本适配层。具体实践包括使用...
数据集成在大数据中的关键作用：从理论到实践全解析
2025-10-07 16:13

AI原生应用开发的博客理论层面：解析数据集成的架构模式、核心技术与数学模型实践层面：通过代码实现与案例分析，展示从数据抽取到价值输出的全流程行业层面：覆盖金融、零售、医疗等典型场景的解决方案章节核心内容理论篇核心概念、架构...
没有解决我的问题, 去提问

从mysql数据库字符串字段中提取所有文件链接URL到列表

1条回答 默认 最新

1条回答默认最新