douliaotong4944
2015-04-22 08:27
浏览 98
已采纳

如何使用php code-igniter读取docx文件数学方程式

I am trying to read a docx file from php, as i read successfully but i didnt get some equation in the word document, as i am newbie in php i didnt know how to read that please suggest some ideas, the function i have tried to read the document is

function index()
{
    $document = 'file_path';
    $text_output = $this->read_docx($document);
    echo nl2br($text_output);

}
private function read_docx($filename) 
{
    var_dump($filename);
    $striped_content = '';
    $content = '';

    $zip = zip_open($filename);

    if (!$zip || is_numeric($zip))
        return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE)
            continue;

        if (zip_entry_name($zip_entry) != "word/document.xml")
            continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }// end while

    zip_close($zip);

    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "
", $content);
    $striped_content = strip_tags($content);

    return $striped_content;
}

This is the sample math equation in the docx file which i am trying to read and render to html page. thanks This is the sample math equation i am trying to read and render to html page

图片转代码服务由CSDN问答提供 功能建议

我正在尝试从php读取docx文件,因为我读了成功,但我没有得到一些方程式 文档,因为我是PHP的新手,我不知道如何阅读,请提出一些想法,我试图阅读文档的功能是

  function index()
  {
 $ document ='file_path'; 
 $ text_output = $ this-&gt; read_docx($ document); 
 echo nl2br($ text_output); 
 
} 
private function read_docx($ filename)
  {
 var_dump($ filename); 
 $ striped_content =''; 
 $ content =''; 
 
 $ zip = zip_open($ filename); 
 
 if(!$ zip || is_numeric  ($ zip))
返回false; 
 
 while($ zip_entry = zip_read($ zip)){
 
 if if(zip_entry_open($ zip,$ zip_entry)== FALSE)
 continue; 
  
 if(zip_entry_name($ zip_entry)!=“word / document.xml”)
 continue; 
 
 $ content。= zip_entry_read($ zip_entry,zip_entry_filesize($ zip_entry)); 
 
 zip_entry_close($  zip_entr  y); 
} //结束时
 
 zip_close($ zip); 
 
 $ content = str_replace('&lt; / w:r&gt;&lt; / w:p&gt;&lt; / w:tc&gt  ;&lt; w:tc&gt;',“”,$ content); 
 $ content = str_replace('&lt; / w:r&gt;&lt; / w:p&gt;',“
 
”,$ content)  ; 
 $ striped_content = strip_tags($ content); 
 
返回$ striped_content; 
} 
   
 
 

这是docx文件中的示例数学公式 我正在尝试阅读并呈现到HTML页面。 谢谢 \ n

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • dtb81443 2015-05-04 12:21
    已采纳

    I fully go through this https://msdn.microsoft.com/en-us/library/aa982683(v=office.12).aspx#Office2007ManipulatingXMLDocs_exploring and parse the xml using php xmlreader()

    $document = 'url';
    /*Function to extract images*/ 
    function readZippedImages($filename) 
    {
        $for_image = $filename;
        /*Create a new ZIP archive object*/
        $zip = new ZipArchive;
        /*Open the received archive file*/
        $final_arr=array();
        $repo = array();
        if (true === $zip->open($filename)) 
        {
            for ($i=0; $i<$zip->numFiles;$i++) 
            {
                if($i==3)//should be document.xml
                {
                    //======function using xml parser================================//
                    $check = $zip->getFromIndex($i);
                    //Create a new XMLReader Instance
                    $reader = new XMLReader();
                    //Loading from a XML File or URL
                    //$reader->open($check);
                    //Loading from PHP variable
                    $reader->xml($check);
    
                    //====================parsing through the document==================//
                    while($reader->read())
                    {
                    $node_loc = $reader->localName;
                    if($reader->nodeType == XMLREADER::ELEMENT && $reader->localName == 'body')
                    {
                     $reader->read();
                     $read_content = $reader->value. "
    ";
                    }
                    if($node_loc == '#text')//parsing all the text from document using #text tag
                    {
                        $temp_value = array("text"=>$reader->value);
                        array_push($final_arr,$temp_value);
                        $reader->read();
                        $read_content = $reader->value. "
    ";
                    }
                     if($node_loc == 'blip')//parsing all the images using blip tag which is under drawing tag
                    {
                        $attri_r = $reader->getAttribute("r:embed");
                        $current_image_name = $repo[$attri_r];
                        $image_stream = $this->showimage($for_image,$current_image_name);//return the base64 string
                        $temp_value = array("image"=>$image_stream);
                        array_push($final_arr,$temp_value);
                    }
                    }
                    //==================xml parser end============================//
                }
                if($i==2)//should be rels.xml
                {
                    $check_id = $zip->getFromIndex($i);
                    $reader_relation = new XMLReader();
                    $reader_relation->xml($check_id);
    
                    //====================parsing through the document==================//
                    while($reader_relation->read())
                    {
                        $node_loc = $reader_relation->localName;
                        if($reader_relation->nodeType == XMLREADER::ELEMENT && $reader_relation->localName == 'Relationship')
                        {
                         $read_content_id = $reader_relation->getAttribute("Id");
                         $read_content_name = $reader_relation->getAttribute("Target");
                         $repo[$read_content_id]=$read_content_name;
                        }
    
                    }
                }
            }
         }
    }
    
    
    function showimage($zip_file_original, $file_name_image) 
    {
        $file_name_image = 'word/'.$file_name_image.'';// getting the image in the zip using its name
        $z_show = new ZipArchive();
        if ($z_show->open($zip_file_original) !== true) {
            echo "File not found.";
            return false;
        }
    
        $stat = $z_show->statName($file_name_image);
        $fp   = $z_show->getStream($file_name_image);
        if(!$fp) {
            echo "Could not load image.";
            return false;
        }
    
        header('Content-Type: image/jpeg');
        header('Content-Length: ' . $stat['size']);
        $image = stream_get_contents($fp);
        $picture = base64_encode($image);
        return $picture;//return the base62 string for the current image.
        fclose($fp);
    }
    readZippedImages($document);
    

    print the $final_arr you will get the all text and images in the document.

    已采纳该答案
    打赏 评论
  • duanguan5922 2015-05-02 00:25

    First of all it is a very bad idea to parse XML using a regular expression. Instead use PHP's XML parser that is designed to do this kind of tasks.

    You need to read the specification for Open XML (standard that used by Microsoft Office) to learn about the internal data structure that Microsoft use for storing these kinds of math equation.

    打赏 评论

相关推荐 更多相似问题