java poi读取word2003,WordExtractor无法识别doc文档

poi版本是poi-3.17
具体报错如下:
java.lang.IllegalArgumentException: The document is really a UNKNOWN file
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:123)
at org.apache.poi.hwpf.extractor.WordExtractor.(WordExtractor.java:51)
at ETDemo.readWord(ETDemo.java:24)
at ETDemo.main(ETDemo.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
代码如下:
String text = "";
File file = new File(filePath);
//2003
if(file.getName().endsWith(".doc")){

        try {
            FileInputStream stream =null;
            stream=new FileInputStream(file);
            WordExtractor word = new WordExtractor(stream);

2个回答

看你的文件的格式是否被WordExtractor 支持。是不是word2003的,还是别的格式,用了doc后缀

楼主解决了吗,我也碰到一样的问题

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!