上传docx文件并使用php将内容和图像存储到DB中

I am trying to upload one docx file containing some contents as well as images. I want to store the file's content and images should be stored into Database. I am using php and mysql. I have trying like:

<?php

/*Name of the document file*/
$document = 'file.docx';

function get_string_between($string, $start, $end) {
        $string = " " . $string;
        $ini = strpos($string, $start);
        if ($ini == 0)
            return "";
        $ini += strlen($start);
        $len = strpos($string, $end, $ini) - $ini;
        return substr($string, $ini, $len);
    }

function read_file_docx($filename){

    $striped_content = '';
    $content = '';

    if(!$filename || !file_exists($filename)) return false;

    $zip = zip_open($filename);

    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }// end while

    zip_close($zip);

    echo $im = get_string_between($content,"descr=","/>");
    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
//    $content = str_replace('</w:r></w:p>', "
", $content);
    //$content = str_replace('</w:r></w:p>', "-viavitae-", $content);
  //  $content = str_replace('-viavitae-', $im, $content);
//    $striped_content = strip_tags($content);
    $striped_content = ($content);
//    echo get_string_between($striped_content,"descr=","/>");
    return $striped_content;
}

$content = read_file_docx($document);
if($content !== false) {

    echo nl2br($content);
}
else {
    echo 'Couldn\'t the file. Please check that file.';
}

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongliu0823 2014-12-08 08:05
关注
You can already get the $content using your code.

$content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

Now all you need to do is to parse the content as an XML file:

You need to get all 'w:p' elements (which are paragraphs), get their text (everything that is inside a 'w:t' tag), and join all those blocks together with ' ' to create the paragraphs.

For the images, you can get them in the folder /word/media/*
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

编辑

预览

报告相同问题？

关注问题

使用php编辑上传的.docx文件 php
2014-08-12 03:44

回答 2 已采纳 Ok, there is one very simple solution as to how to get access to the contents of the zip folder. I
想生成一个docx文件并使用PHP将其作为附件发送（PHPMailer） php
2013-09-16 11:51

回答 2 已采纳 You're not generating a Word file... you're generating plain text and then PRETENDING it's Word by
PHP - 使用预准备语句未将文件上载到数据库中 mysql php
2017-04-02 23:35

回答 2 已采纳 Have you considered using http://php.net/manual/en/mysqli-stmt.send-long-data.php ((PHP 5, PHP 7)
一些可以参考的文档集合7
2022-07-18 01:08

xuejianxinokok的博客 1), dst=img_dest) cv2.imshow('result', img_dest) cv2.waitKey() cv2.destroyAllWindows() 使用 OpenCV 进行图像投影变换投影变换（仿射变换）在数学中，线性变换是将一个向量空间映射到另一个向量空间的函数，...
如何使用php code-igniter读取docx文件数学方程式 php
2015-04-22 00:27

回答 2 已采纳 I fully go through this https://msdn.microsoft.com/en-us/library/aa982683(v=office.12).aspx#Office
python使用import docx读取word内容有缺失 python
2023-04-17 13:09

回答 2 已采纳该回答引用chatgpt:针对这种情况，可以使用正则表达式过滤掉空格和特殊字符，以保证读取的内容完整。例如，可以使用如下代码过滤空格和特殊字符： import re import docx doc
使用Pdf / Docx代替图像的PHP fileUpload php
2017-03-30 07:46

回答 1 已采纳 Basically, I just changed all $imagefileType to $fileType and it works!! I just skipped the image
Java 框架、库和软件的精选列表(Awesome Java)
2022-03-27 11:12

白羊沈歌的博客原创翻译，原始链接本文为awesome系列中的awesome java 文章目录项目Bean映射构建字节码操作缓存CLI集群管理代码分析代码覆盖率代码生成器编译器计算机视觉配置约束满足问题求解器CSV数据结构数据库日期和时间依赖...
为什么idea 中 docx 文件有蓝色问号? java 前端
2022-01-07 07:51

回答 2 已采纳 idea应该是不支持.docx的文件,不支持的文件图标都是?
PHP将Word文件转换为HTML而不会丢失样式和图像[关闭] php
2013-02-20 23:13

回答 4 已采纳 I've spent a bit of time loking into this, and the best solution that I've found was to install un
C#将文件写入SQL中再读取，.docx格式有问题 c# sql
2015-08-14 20:07

回答 3 已采纳文件超长或者编码有问题，文件不完整用word打开往往就是报错，但是可以打开。建议你下载下来文件和原始文件用windiff比较下，看看是不是文件不完整或者中间有错误。 pdf doc不报错也不见得
Atiitt 前端技术点清单列表 attilax总结 v2 s11.docx 1. ui与前端系列类库与api 3 1.1. 概念性技术 4 1.2. 运行环境 4 1.3. Ui技术点 4 1
2018-02-05 15:39

attilax的博客 v2 s11.docx 1. ui与前端系列类库与api 3 1.1. 概念性技术 4 1.2. 运行环境 4 1.3. Ui技术点 4 1.4. H5 4 1.5. Css js 4 1.6. 浏览器api bom 4 1.7. Dom处理 4 1.8. 分类 4 1.9. 通讯 5 1.10. 业务...
Java常用类库以及简介，具体使用细节进行百度（爬虫爬取的数据）
2017-03-27 14:30

javartisan的博客来至于互联网 Office文档的Java处理包 POI [推荐] Apache POI是一个开源的Java读写Excel、WORD等微软OLE2组件文档的项目。目前POI已经有了Ruby版本。...Jodd 是一个开源的 Java 工具集，包含一些实用的工具类和小
推荐：完备的 AI 学习路线，最详细的资源整理！
2020-07-20 00:32

数据分析v的博客【导读】本文由知名开源平台，AI技术平台以及领域专家：Datawhale，ApacheCN，AI有道和黄海广博士联合整理贡献，内容涵盖AI入门基础知识、数据分析\挖掘、机器学习、深度学习...
计算机基础知识(免费、全面)
2020-12-27 00:49

MrFlySand_飞沙的博客查看目录网络基础知识* 计算机基础知识1* 计算机基础知识2# 计算机病毒* 病毒:人为...引导型病毒存放在引导区中* 文件病毒:大多是可执行文件,如:黑色星期五、CIH(硬盘/BIOS数据丢失)。复合病毒:感染文件和引导区* .
没有解决我的问题, 去提问

悬赏问题

¥15 PADS Logic 原理图
¥15 PADS Logic 图标
¥15 电脑和power bi环境都是英文如何将日期层次结构转换成英文
¥20 气象站点数据求取中~
¥15 如何获取APP内弹出的网址链接
¥15 wifi 图标不见了不知道怎么办上不了网变成小地球了

上传docx文件并使用php将内容和图像存储到DB中

1条回答 默认 最新

悬赏问题

1条回答默认最新