dou7851 2015-06-16 15:15
浏览 88

如何为docx文件读取文本,如antiword?

Who knows, how read file.docx like antiword for file.doc in php ? This I used antiword for file.doc and set text in DB

    $em = $this->getDoctrine()->getManager();
    $request = $this->get('request');
    $developer = $em->getRepository('ProfileBundle:Developer')->findOneById($id);

    if (! $developer) {
        throw $this->createNotFoundException('Unable to find a profile.');
    }

    $cv = $developer->getCvDirUri();

    if($cv && file_exists($cv)) {
        unlink($cv);
    }

    $form = $this->createForm(new DeveloperDirCvType(), array());

    if ($request->isMethod('POST')) {

        $form->bind($request);
        if ($form->isValid()) {

            $data = $form->getData();

            $uploader = $this->get('artel.profile.file_uploader');
            $path = $uploader->uploadFile($data['photo']);
            $developer->setCvDirUri($path['url']);
            $content = shell_exec('/usr/bin/antiword '.'chmod o+r /var/www/aog-profile/web/'.$path['url']);
            if ($data['photo']->getClientMimeType() == 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') {
                $content_txt = exec('/usr/bin/abiword --to=html '.'/var/www/aog-profile/web/'.$path['url']);

            }
            elseif ($data['photo']->getClientMimeType() == 'application/pdf') {
                $parser = new \Smalot\PdfParser\Parser();
                $pdf    = $parser->parseFile('/var/www/aog-profile/web/'.$path['url']);

                $content = $pdf->getText();

            } 
            else{
                $content = shell_exec('/usr/bin/antiword -m UTF-8.txt '.'chmod o+r /var/www/aog-profile/web/'.$path['url']);
            }


            $url = sprintf(
                '%s%s',
                $this->container->getParameter('acme_storage.amazon_s3.base_url'),
                $this->getPhotoUploader()->uploadFromUrl($path['url'])
            );

            $developer->setTextCv($content);
            $developer->setCvUri($url);


            $em->flush();

If file.doc I used antiword and setTextCv($content) and I have text in DB and I upload in amazon, BUT

If this file docx I upload docx file in /upload/Cv/file.docx and create file.html. Then I need setTextCv('text in file html') or if you know another method ? And I dont know how its do it right. Any idea?

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥100 set_link_state
    • ¥15 虚幻5 UE美术毛发渲染
    • ¥15 CVRP 图论 物流运输优化
    • ¥15 Tableau online 嵌入ppt失败
    • ¥100 支付宝网页转账系统不识别账号
    • ¥15 基于单片机的靶位控制系统
    • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
    • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
    • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
    • ¥15 手机接入宽带网线,如何释放宽带全部速度