I'm trying to write a series of functions that will extract the document.xml part of a MS Word DOCX file and effectively mail merge a series of key/value pairs to replace the defined template fields in the document. I have a function that uses xml_parse_into_struct
to convert the XML text into the necessary arrays, but once I'm done with the replacing of text I'll (presumably) need to use the ZipArchive
method addFromString
to create the new document.xml file and add it to the DOCX zip container. But I'm not sure how to do that when I'm working with an array of data rather than an XML string. Is there a way to convert an array back into the XML string format?
Here's what I have so far:
// $filename = name of DOCX file to open
function get_docx_xml($filename) {
// Extract XML from DOCX file
$zip = new ZipArchive();
if ($zip->open($filename, ZIPARCHIVE::CHECKCONS) !== TRUE) { echo 'failed to open template'; exit; }
$xml = 'word/document.xml';
$data = $zip->getFromName($xml);
$zip->close();
// Create the XML parser and create an array of the results
$parser = xml_parser_create_ns();
xml_parse_into_struct($parser, $data, $vals, $index);
xml_parser_free($parser);
// Return the relevant XML information
return array('vals' => $vals, 'index' => $index);
}
That part works fine, I can print_r
both arrays and make sense of the results. However, the following function does not work -- at least not in all cases. If I use certain delimiters for the fields to be replaced it works, but not all the time which I assume is an issue with Word's character encoding or other formatting.
// $templateFile = original, unedited template; $newFile = new file name to be created; $row = array of data to merge in
function mailmerge($templateFile, $newFile, $row) {
if (!copy($templateFile, $newFile)) // make a duplicate so we dont overwrite the template
return false; // could not duplicate template
$xmldata = get_docx_xml($newFile);
$zip = new ZipArchive();
if ($zip->open($newFile, ZIPARCHIVE::CHECKCONS) !== TRUE)
return false; // probably not a docx file
$file = 'word/document.xml';
$data = $zip->getFromName($file);
foreach ($row as $key => $value) {
$data = str_replace($key, xml_escape($value), $data);
}
$zip->deleteName($file);
$zip->addFromString($file, $data);
$zip->close();
return true;
}
So instead of using str_replace (which fails a lot of the time) I was planning on cycling the $vals array that I get from the first function, doing the replace there, and then saving the resulting array back to a string and, in turn, back into the DOCX zip container.