使用PHP或Javascript以编程方式比较两个word或excel或powerpoint文档

Following are some requirements for my new project.

Admin will upload a file which will be in format of Ms Word 2007 or Ms Excel 2007 or Ms Power Point 2007.

Lets say that admin has uploaded a file named demo1.docx file.

Now demo1.docx is a master file.

Now other users will upload their own files like demo2.docx, demo3.docx etc.

I want to compare demo2.docx and demo3.docx files with master file demo1.docx.

Files uploaded by other users must be copy of the master file. I mean number of characters, text, formatting have to be same as the master file.

If it is excel file, then number of sheets, no. of cells filled have to be same and same thing apply to power point files.

I want to do this using PHP or Javascript.

So can u please tell me if it is possible or not? and if it is possible then suggest me some ways to accomplish this task.

Thanks in advance.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
donglin7383 2018-10-13 05:43
关注
To match them byte for byte the most efficient way is

if(hash_file('sha1', $pathToFile1) == hash_file('sha1', $pathToFile2))

if that's too exact, you could strip whitespace. From text files, not binary files like docx or xlsx files.

if(hash('sha1', str_replace(' ', '', file_get_contents( $pathToFile1))) == hash('sha1', str_replace(' ', '', file_get_contents( $pathToFile2))))

Or something like that to normalize the text. For binary file types you will have to use some library for that type of file to convert them first to text.

In other words you will have to come up with some way to normalize the text contents of the file, such as upper casing everything and removing spaces or other acceptable differences.

Normalizing is a fancy way of saying, removing the differences. A simple example is this.

Some text

Now is that the same as Some text.? Or Some Text or some Text that depends. But "normalizing them" may look like this sometext with no punctuation, spaces or casing. It's up to you to decide how you normalize them.

Because of the mention of the binary formats I can't help you there as you will need to find a way to open them in PHP, which will require some 3rd party libraries.

Your question is very Broad, so I can only give you a Broad overview of how to do it.

Hashing is nice because it takes a file of {x} size and makes it 40 characters long (in the case of sha1) which is a lot easier to store in a DB, or visualize. I mention the DB because you can cut the operation in half by pre-normalizing and hashing the Known file (the source file). This will reduce the overall cost of comparing them.

UPDATE

Here is an example

echo hash('sha1', 'The same text') == hash('sha1', 'the same text') ? 'true' : 'false';

The output will be false However if you do this:

echo hash('sha1', strtolower('The same text')) == hash('sha1', strtolower('the same text')) ? 'true' : 'false';

The output will be true

Sandbox

A small amount of text is no different then a large amount. The difference between the two pieces of code above, is I normalized one and not the other.

UPDATE1

ok. do u know the softwares like Typing Tutor.. which takes typing test. There is one fixed paragraph and user will write that paragraph in text box with same formatting.

$old = 'The same text'; $arr_old = explode(' ', $old); $new = 'the same text'; $pattern = '/\b('.implode(')\b|\b(', array_map('preg_quote', $arr_old)).')\b/'; preg_match_all($pattern, $new, $matches ); print_r($matches);

Output

Array ( [0] => Array ( [0] => same [1] => text ) [1] => Array ( [0] => [1] => ) [2] => Array ( [0] => same [1] => ) [3] => Array ( [0] => [1] => text ) )

It's important to mention that the index of the match(-1), will match the index of the word. For example in the above there is no match in $matches[1] there is no match. This corresponds to The which is the first item in $arr_old = explode(' ', $old); or [0=>'The', 1=>'same', 2=>'text'] But because the match is 1 based and the array is 0 based you have to subtract 1.

PS to check these I would do something like

$len = count($matches); for($i=1;$i<$len;$i++){ if(!empty(array_filter($matches[$i]))) echo "match ".$arr_old[$i-1]." "; }

Output:

match same match text

Sandbox

I hope that helps.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用PHP或Javascript以编程方式比较两个word或excel或powerpoint文档 php
2018-10-13 03:52

回答 1 已采纳 To match them byte for byte the most efficient way is if(hash_file('sha1', $pathToFile1) == hash
使用JavaScript编程循环输出 html javascript
2023-04-21 15:09

回答 1 已采纳 <!DOCTYPE html> <html> <body> <script> for (var i=1; i<=100; i++){ if (
更改字符串中每个单词的字体颜色（JS或PHP） css javascript jquery php
2015-09-24 13:44

回答 3 已采纳 You can wrap the each word in separate span or any other element and then can be styled differentl
php功能大马加密乱码,php大马加密工具 phpTrace：奇虎360开源的PHP脚”的相关知识...
2021-03-29 08:04

weixin_39645165的博客 phpTrace：奇虎360开源的PHP脚本分析工具问题：phpTrace：奇虎360开源的PHP脚本分析工具回答：phpTrace是奇虎 360开源的一款和分析PHP脚本的工具。如果开发者用过strace的话，则可能很容易想到phpTrace到底实现了...
用JavaScript编程，使用prompt方法输出若干个整数，然后用alert方法排序输出。 javascript 有问必答
2021-05-16 23:25

回答 2 已采纳 <script type="text/javascript"> var str = prompt("请输入若干个整数(用空格分隔)：",""); var arr = str.split(
如何用php，将<textarea>内的文本和媒体内容写入doc文档 javascript php
2018-09-23 05:05

回答 1 已采纳两个办法，一个是直接用scrpting库和剪贴板，粘贴到word里保存。不需要服务器，但是需要客户端安装word 另一个是先提交textarea到服务器，用phpword生成word，然后下载给客户
使用php从输入框中获取值 html javascript php
2018-06-07 18:04

回答 1 已采纳 Use a form with the method attribute set to ‘get’ or ‘post’ like this: <form method=“post”>
php中files和FILRS,PHP_php利用header函数实现文件下载时直接提示保存，复制代码代码如下: <?php $fil - phpStudy...
2021-04-11 12:28

肖云成的博客 php利用header函数实现文件下载时直接提示保存复制代码代码如下:$filename = '路径+实际文件名';//文件的类型header('Content-type: application/pdf');//下载显示的名字header('Content-Disposition: attachment; ...
我可以使用php从localstorage获取数据吗？ javascript php
2017-05-30 12:14

回答 3 已采纳 No, you can't. PHP runs on your server, the LocalStorage is on the client. The only way is to read
php怎么对视频播放地址进行加密 javascript php
2020-09-04 15:33

回答 1 已采纳 https://blog.csdn.net/qincidong/article/details/82781699
用JavaScript（或用jQuery）写一个自定义对话框，页面使用html5和CSS。 css html5 javascript jquery
2017-08-01 01:06

回答 5 已采纳 ![![图片说明](https://img-ask.csdn.net/upload/201708/01/1501553733_684813.png) 图片说明](https://img-ask.cs
高质量PHP代码的50个实用技巧：非常值得收藏
2016-02-26 11:28

左手码农的博客 1.不要使用相对路径常常会看到: require_once('../../lib/some_class.php'); 该方法有很多缺点: 它首先查找指定的php包含路径, 然后查找当前目录。因此会检查过多路径。如果该脚本被另一目录的脚本包含, 它的...
js如何调用php文件内显示的数值到html？ ajax html5 javascript php
2018-02-03 04:46

回答 5 已采纳 ``` index.html ```
计算机基础-知识点总结整理
2024-01-03 14:54

Tech Lee的博客 6.各种常用文件扩展名的意义等。（二）应用能力考试内容1.操作系统的使用2.网络基础知识及因特网的应用3.文字处理（Word）基本知识4.电子表格（Excel）基本知识5.幻灯片（PowerPoint）基本知识
【PHP教程（三）】PHP包管理、Markdown库和Excel解析库
2023-04-23 21:05

Almond_02的博客是一个第三方的软件源，它提供了一系列针对 CentOS、Red Hat Enterprise Linux（RHEL）和 Fedora 等系统的软件包。Remi 存储库的目的是提供最新版本的各种软件包，包括 PHP、MySQL、Redis、Nginx 等。对于 PHP ...
没有解决我的问题, 去提问

悬赏问题

¥15 python：excel数据写入多个对应word文档
¥60 全一数分解素因子和素数循环节位数
¥15 ffmpeg如何安装到虚拟环境
¥188 寻找能做王者评分提取的
¥15 matlab用simulink求解一个二阶微分方程，要求截图
¥30 乘子法解约束最优化问题的matlab代码文件，最好有matlab代码文件
¥15 写论文，需要数据支撑
¥15 identifier of an instance of 类 was altered from xx to xx错误
¥100 反编译微信小游戏求指导
¥15 docker模式webrtc-streamer 无法播放公网rtsp

使用PHP或Javascript以编程方式比较两个word或excel或powerpoint文档

1条回答 默认 最新

悬赏问题

1条回答默认最新