贾志轩 2022-11-11 13:53 采纳率: 0%
浏览 14

html解析到word并支持数学公式的解决方式?

现在遇到个难解的问题,我们项目中使用三方题库试题时,线下教师有个需求,就是将题目导出到word(本质就是html导出到word),详细要求要在Java后端支持图片(自动下载)、数学公式(也包括化学和物理等公式,基础是“LeteX”)、html的所有样式,我们之前使用过“spire.doc”,但是它除了对解析公式不支持外,收费还真是挺贵的,后来用的“XWPFDocument”这一系列的方式解析的,但是它对html原有内嵌样式无法支持,图片位置也基本无法还原,综上所述,各位朋友有没有更成熟的方式,希望赐教下。
  • 写回答

1条回答 默认 最新

  • 生产队的小刘 Python领域新星创作者 2022-11-11 20:20
    关注

    精心整理了两种方法,希望您能采纳

    方法一

    1. 打开网页后单击浏览器菜单栏的“文件”-->另存为,然后选择“网页,全部htm,html”格式;

    img

    1. 我们打开Word,然后选择Word菜单栏的“文件”-->打开,找到刚才保存文件的地方,选择打开;

    img

    1. 打开以后我们删除一些不需要的地方,比如:顶部和底部的,那些对于我们来说都没什么用,我们要主要就是要内容。然后打开菜单栏的“表格”-->转换-->表格转换为文本,将一些文档中的表格一一去掉。

    img

    方法二:通过jquery.wordexport.js导出word

    备注:兼容IE9以上

    大概浏览了下jquery.wordexport.js插件的代码,了解到了通过该插件可以导出文本和图片,而图片首先通过canvas的形式

    绘制,文本则需要再依赖FileSaver.js插件,FileSaver.js插件则主要通过H5的文件操作新特性new Blob()和new FileReader()

    来实现文本的导出。

    插件源码:

    FileSaver.js

    /* FileSaver.js
     * A saveAs() FileSaver implementation.
     * 1.3.2
     * 2016-06-16 18:25:19
     *
     * By Eli Grey, http://eligrey.com
     * License: MIT
     *   See https://github.com/eligrey/FileSaver.js/blob/master/LICENSE.md
     */
    
    /*global self */
    /*jslint bitwise: true, indent: 4, laxbreak: true, laxcomma: true, smarttabs: true, plusplus: true */
    
    /*! @source http://purl.eligrey.com/github/FileSaver.js/blob/master/FileSaver.js */
    
    var saveAs = saveAs || (function(view) {
            "use strict";
            // IE <10 is explicitly unsupported
            if (typeof view === "undefined" || typeof navigator !== "undefined" && /MSIE [1-9]\./.test(navigator.userAgent)) {
                return;
            }
            var
                doc = view.document
            // only get URL when necessary in case Blob.js hasn't overridden it yet
                , get_URL = function() {
                    return view.URL || view.webkitURL || view;
                }
                , save_link = doc.createElementNS("http://www.w3.org/1999/xhtml", "a")
                , can_use_save_link = "download" in save_link
                , click = function(node) {
                    var event = new MouseEvent("click");
                    node.dispatchEvent(event);
                }
                , is_safari = /constructor/i.test(view.HTMLElement)
                , is_chrome_ios =/CriOS\/[\d]+/.test(navigator.userAgent)
                , throw_outside = function(ex) {
                    (view.setImmediate || view.setTimeout)(function() {
                        throw ex;
                    }, 0);
                }
                , force_saveable_type = "application/octet-stream"
            // the Blob API is fundamentally broken as there is no "downloadfinished" event to subscribe to
                , arbitrary_revoke_timeout = 1000 * 40 // in ms
                , revoke = function(file) {
                    var revoker = function() {
                        if (typeof file === "string") { // file is an object URL
                            get_URL().revokeObjectURL(file);
                        } else { // file is a File
                            file.remove();
                        }
                    };
                    setTimeout(revoker, arbitrary_revoke_timeout);
                }
                , dispatch = function(filesaver, event_types, event) {
                    event_types = [].concat(event_types);
                    var i = event_types.length;
                    while (i--) {
                        var listener = filesaver["on" + event_types[i]];
                        if (typeof listener === "function") {
                            try {
                                listener.call(filesaver, event || filesaver);
                            } catch (ex) {
                                throw_outside(ex);
                            }
                        }
                    }
                }
                , auto_bom = function(blob) {
                    // prepend BOM for UTF-8 XML and text/* types (including HTML)
                    // note: your browser will automatically convert UTF-16 U+FEFF to EF BB BF
                    if (/^\s*(?:text\/\S*|application\/xml|\S*\/\S*\+xml)\s*;.*charset\s*=\s*utf-8/i.test(blob.type)) {
                        return new Blob([String.fromCharCode(0xFEFF), blob], {type: blob.type});
                    }
                    return blob;
                }
                , FileSaver = function(blob, name, no_auto_bom) {
                    if (!no_auto_bom) {
                        blob = auto_bom(blob);
                    }
                    // First try a.download, then web filesystem, then object URLs
                    var
                        filesaver = this
                        , type = blob.type
                        , force = type === force_saveable_type
                        , object_url
                        , dispatch_all = function() {
                            dispatch(filesaver, "writestart progress write writeend".split(" "));
                        }
                    // on any filesys errors revert to saving with object URLs
                        , fs_error = function() {
                            if ((is_chrome_ios || (force && is_safari)) && view.FileReader) {
                                // Safari doesn't allow downloading of blob urls
                                var reader = new FileReader();
                                reader.onloadend = function() {
                                    var url = is_chrome_ios ? reader.result : reader.result.replace(/^data:[^;]*;/, 'data:attachment/file;');
                                    var popup = view.open(url, '_blank');
                                    if(!popup) view.location.href = url;
                                    url=undefined; // release reference before dispatching
                                    filesaver.readyState = filesaver.DONE;
                                    dispatch_all();
                                };
                                reader.readAsDataURL(blob);
                                filesaver.readyState = filesaver.INIT;
                                return;
                            }
                            // don't create more object URLs than needed
                            if (!object_url) {
                                object_url = get_URL().createObjectURL(blob);
                            }
                            if (force) {
                                view.location.href = object_url;
                            } else {
                                var opened = view.open(object_url, "_blank");
                                if (!opened) {
                                    // Apple does not allow window.open, see https://developer.apple.com/library/safari/documentation/Tools/Conceptual/SafariExtensionGuide/WorkingwithWindowsandTabs/WorkingwithWindowsandTabs.html
                                    view.location.href = object_url;
                                }
                            }
                            filesaver.readyState = filesaver.DONE;
                            dispatch_all();
                            revoke(object_url);
                        }
                        ;
                    filesaver.readyState = filesaver.INIT;
    
                    if (can_use_save_link) {
                        object_url = get_URL().createObjectURL(blob);
                        setTimeout(function() {
                            save_link.href = object_url;
                            save_link.download = name;
                            click(save_link);
                            dispatch_all();
                            revoke(object_url);
                            filesaver.readyState = filesaver.DONE;
                        });
                        return;
                    }
    
                    fs_error();
                }
                , FS_proto = FileSaver.prototype
                , saveAs = function(blob, name, no_auto_bom) {
                    return new FileSaver(blob, name || blob.name || "download", no_auto_bom);
                }
                ;
            // IE 10+ (native saveAs)
            if (typeof navigator !== "undefined" && navigator.msSaveOrOpenBlob) {
                return function(blob, name, no_auto_bom) {
                    name = name || blob.name || "download";
    
                    if (!no_auto_bom) {
                        blob = auto_bom(blob);
                    }
                    return navigator.msSaveOrOpenBlob(blob, name);
                };
            }
    
            FS_proto.abort = function(){};
            FS_proto.readyState = FS_proto.INIT = 0;
            FS_proto.WRITING = 1;
            FS_proto.DONE = 2;
    
            FS_proto.error =
                FS_proto.onwritestart =
                    FS_proto.onprogress =
                        FS_proto.onwrite =
                            FS_proto.onabort =
                                FS_proto.onerror =
                                    FS_proto.onwriteend =
                                        null;
    
            return saveAs;
        }(
            typeof self !== "undefined" && self
            || typeof window !== "undefined" && window
            || this.content
        ));
    // `self` is undefined in Firefox for Android content script context
    // while `this` is nsIContentFrameMessageManager
    // with an attribute `content` that corresponds to the window
    
    if (typeof module !== "undefined" && module.exports) {
        module.exports.saveAs = saveAs;
    } else if ((typeof define !== "undefined" && define !== null) && (define.amd !== null)) {
        define([], function() {
            return saveAs;
        });
    }
    

    jquery.wordexport.js

    if (typeof jQuery !== "undefined" && typeof saveAs !== "undefined") {
        (function($) {
            $.fn.wordExport = function(fileName) {
                fileName = typeof fileName !== 'undefined' ? fileName : "jQuery-Word-Export";
                var static = {
                    mhtml: {
                        top: "Mime-Version: 1.0\nContent-Base: " + location.href + "\nContent-Type: Multipart/related; boundary=\"NEXT.ITEM-BOUNDARY\";type=\"text/html\"\n\n--NEXT.ITEM-BOUNDARY\nContent-Type: text/html; charset=\"utf-8\"\nContent-Location: " + location.href + "\n\n<!DOCTYPE html>\n<html>\n_html_</html>",
                        head: "<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n<style>\n_styles_\n</style>\n</head>\n",
                        body: "<body>_body_</body>"
                    }
                };
                var options = {
                    maxWidth: 624
                };
                // Clone selected element before manipulating it
                var markup = $(this).clone();
    
                // Remove hidden elements from the output
                markup.each(function() {
                    var self = $(this);
                    if (self.is(':hidden'))
                        self.remove();
                });
    
                // Embed all images using Data URLs
                var images = Array();
                var img = markup.find('img');
                for (var i = 0; i < img.length; i++) {
                    // Calculate dimensions of output image
                    var w = Math.min(img[i].width, options.maxWidth);
                    var h = img[i].height * (w / img[i].width);
                    // Create canvas for converting image to data URL
                    var canvas = document.createElement("CANVAS");
                    canvas.width = w;
                    canvas.height = h;
                    // Draw image to canvas
                    var context = canvas.getContext('2d');
                    context.drawImage(img[i], 0, 0, w, h);
                    // Get data URL encoding of image
                    var uri = canvas.toDataURL("image/png/jpg");
                    $(img[i]).attr("src", img[i].src);
                    img[i].width = w;
                    img[i].height = h;
                    // Save encoded image to array
                    images[i] = {
                        type: uri.substring(uri.indexOf(":") + 1, uri.indexOf(";")),
                        encoding: uri.substring(uri.indexOf(";") + 1, uri.indexOf(",")),
                        location: $(img[i]).attr("src"),
                        data: uri.substring(uri.indexOf(",") + 1)
                    };
                }
    
                // Prepare bottom of mhtml file with image data
                var mhtmlBottom = "\n";
                for (var i = 0; i < images.length; i++) {
                    mhtmlBottom += "--NEXT.ITEM-BOUNDARY\n";
                    mhtmlBottom += "Content-Location: " + images[i].location + "\n";
                    mhtmlBottom += "Content-Type: " + images[i].type + "\n";
                    mhtmlBottom += "Content-Transfer-Encoding: " + images[i].encoding + "\n\n";
                    mhtmlBottom += images[i].data + "\n\n";
                }
                mhtmlBottom += "--NEXT.ITEM-BOUNDARY--";
    
                //TODO: load css from included stylesheet
    
                //var styles=' /* Font Definitions */@font-face{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;mso-font-alt:SimSun;mso-font-charset:134;mso-generic-font-family:auto;mso-font-pitch:variable;mso-font-signature:3 680460288 22 0 262145 0;}  @font-face{font-family:"Cambria Math";panose-1:2 4 5 3 5 4 6 3 2 4;mso-font-charset:1;mso-generic-font-family:roman;mso-font-format:other;mso-font-pitch:variable;mso-font-signature:0 0 0 0 0 0;}  @font-face{font-family:"\@宋体";panose-1:2 1 6 0 3 1 1 1 1 1;mso-font-charset:134;mso-generic-font-family:auto;mso-font-pitch:variable;mso-font-signature:3 680460288 22 0 262145 0;}/* Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal{mso-style-unhide:no;mso-style-qformat:yes;mso-style-parent:"";margin:0cm;margin-bottom:.0001pt;mso-pagination:widow-orphan;font-size:14.0pt;font-family:宋体;mso-bidi-font-family:宋体;}p.MsoHeader, li.MsoHeader, div.MsoHeader{mso-style-noshow:yes;mso-style-priority:99;mso-style-link:"页眉 Char";margin:0cm;margin-bottom:.0001pt;text-align:center;mso-pagination:widow-orphan;layout-grid-mode:char;font-size:9.0pt;font-family:宋体;mso-bidi-font-family:宋体;}p.MsoFooter, li.MsoFooter, div.MsoFooter{mso-style-noshow:yes;mso-style-priority:99;mso-style-link:"页脚 Char";margin:0cm;margin-bottom:.0001pt;mso-pagination:widow-orphan;layout-grid-mode:char;font-size:9.0pt;font-family:宋体;mso-bidi-font-family:宋体;}p.MsoAcetate, li.MsoAcetate, div.MsoAcetate{mso-style-noshow:yes;mso-style-priority:99;mso-style-link:"批注框文本 Char";margin:0cm;margin-bottom:.0001pt;mso-pagination:widow-orphan;font-size:9.0pt;font-family:宋体;mso-bidi-font-family:宋体;}span.Char{mso-style-name:"页眉 Char";mso-style-noshow:yes;mso-style-priority:99;mso-style-unhide:no;mso-style-locked:yes;mso-style-link:页眉;font-family:宋体;mso-ascii-font-family:宋体;mso-fareast-font-family:宋体;mso-hansi-font-family:宋体;}span.Char0{mso-style-name:"页脚 Char";mso-style-noshow:yes;mso-style-priority:99;mso-style-unhide:no;mso-style-locked:yes;mso-style-link:页脚;font-family:宋体;mso-ascii-font-family:宋体;mso-fareast-font-family:宋体;mso-hansi-font-family:宋体;}span.Char1{mso-style-name:"批注框文本 Char";mso-style-noshow:yes;mso-style-priority:99;mso-style-unhide:no;mso-style-locked:yes;mso-style-link:批注框文本;font-family:宋体;mso-ascii-font-family:宋体;mso-fareast-font-family:宋体;mso-hansi-font-family:宋体;}p.msochpdefault, li.msochpdefault, div.msochpdefault{mso-style-name:msochpdefault;mso-style-unhide:no;mso-margin-top-alt:auto;margin-right:0cm;mso-margin-bottom-alt:auto;margin-left:0cm;mso-pagination:widow-orphan;font-size:10.0pt;font-family:宋体;mso-bidi-font-family:宋体;}span.msonormal0{mso-style-name:msonormal;mso-style-unhide:no;}.MsoChpDefault{mso-style-type:export-only;mso-default-props:yes;font-size:10.0pt;mso-ansi-font-size:10.0pt;mso-bidi-font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";mso-font-kerning:0pt;}/* Page Definitions */  @page WordSection1{size:595.3pt 841.9pt;margin:72.0pt 90.0pt 72.0pt 90.0pt;mso-header-margin:42.55pt;mso-footer-margin:49.6pt;mso-paper-source:0;}div.WordSection1{page:WordSection1;}';
    
                var styles="";
    
                // Aggregate parts of the file together
                var fileContent = static.mhtml.top.replace("_html_", static.mhtml.head.replace("_styles_", styles) + static.mhtml.body.replace("_body_", markup.html())) + mhtmlBottom;
    
                // Create a Blob with the file contents
                var blob = new Blob([fileContent], {
                    type: "application/msword;charset=utf-8"
                });
                saveAs(blob, fileName + ".doc");
            };
        })(jQuery);
    } else {
        if (typeof jQuery === "undefined") {
            console.error("jQuery Word Export: missing dependency (jQuery)");
        }
        if (typeof saveAs === "undefined") {
            console.error("jQuery Word Export: missing dependency (FileSaver.js)");
        }
    }
    

    插件调用

    index.html

    <!DOCTYPE html>
    <html>
    <head lang="en">
        <meta charset="UTF-8">
        <title>生成word文档</title>
    </head>
    <body lang=ZH-CN style='tab-interval:21.0pt'>
    <div class="word">
        <p align="center" style="font-size:20pt;font-weight:bold;">JS导出Word文档</p>
    </div>
    <input type="button" value="导出word">
    <script src="https://cdn.bootcss.com/jquery/2.2.4/jquery.js"></script>
    <script type="text/javascript" src="js/FileSaver.js"></script>
    <script type="text/javascript" src="js/jquery.wordexport.js"></script>
    <script>
        $(function(){
            $("input[type='button']").click(function(event) {
                $(".word").wordExport('生成word文档');
            });
        })
    </script>
    </body>
    </html>
    

    直接调用wordExport()接口就可以导出word文档,传的参数为导出的word文件名。

    补充

    通过我们常规写的外联样式设置样式是无效的,通过个人的实践发现需要写内联样式才能生效,而单位也需要按照word的配置

    单位pt设置。

    而jquery.wordexport.js插件是要配置了个style样式让我们补充样式设置的:

    img

    但是个人实践了下,设置的样式却无法生效,只能通过内联设置才生效。

    截图:

    img

    评论

报告相同问题?

问题事件

  • 修改了问题 11月14日
  • 创建了问题 11月11日

悬赏问题

  • ¥15 (标签-考研|关键词-set)
  • ¥15 求修改代码,图书管理系统
  • ¥15 请问有没求偏多标签数据集yeast,reference,recreation,scene,health数据集。
  • ¥15 传感网应用开发单片机实训
  • ¥15 Delphi 关于sAlphaImageList使用问题
  • ¥15 寻找将CAJ格式文档转txt文本的方案
  • ¥15 shein测试开发会问些啥我是写java的
  • ¥15 关于#单片机#的问题:我有个课程项目设计,我想在STM32F103veTX单片机,M3主控模块上设计一个程序,在Keil uVision5(C语言)上代码该怎么编译?(嫌钱少我可以加钱,急急急)
  • ¥15 opnet仿真网络协议遇到问题
  • ¥15 在安装python的机器学习程序包scikit-learn(1.1版本)时遇到如下问题