dongmouhao7438 2010-11-24 10:45
浏览 128
已采纳

如何在Linux服务器上刮取MS Word文档文本?

I have been asked about creating a site where some users can upload Microsoft Word documents, then others can then search for uploaded documents that contain certain keywords. The site would be sitting on a Linux server running PHP and MySQL. I'm currently trying to find out if and how I can scrape this text from the documents. If anyone can suggest a good way of going about doing this it would be much appreciated.

  • 写回答

2条回答 默认 最新

  • douershuang7356 2010-11-24 10:53
    关注

    Here's a good example using catdoc:

    function catdoc_string($str)
    {
        // requires catdoc
    
        // write to temp file
        $tmpfname = tempnam ('/tmp','doc');
        $handle = fopen($tmpfname,'w');
        fwrite($handle,$a);
        fclose($handle);
    
        // run catdoc
        $ret = shell_exec('catdoc -ab '.escapeshellarg($tmpfname) .' 2>&1');
    
        // remove temp file
        unlink($tmpfname);
    
        if (preg_match('/^sh: line 1: catdoc/i',$ret)) {
            return false;
        }
    
        return trim($ret);
    }
    
    function catdoc_file($fname)
    {
        // requires catdoc
    
        // run catdoc
        $ret = shell_exec('catdoc -ab '.escapeshellarg($fname) .' 2>&1');
    
        if (preg_match('/^sh: line 1: catdoc/i',$ret)) {
            return false;
        }
    
        return trim($ret);
    }
    

    Source

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 vscode问题请教
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类
  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM