文件number.doc原数据
代码如下:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class Test {
public String readWord(String path) {
String buffer = "";
try {
if (path.endsWith(".doc")) {
InputStream is = new FileInputStream(new File(path));
WordExtractor ex = new WordExtractor(is);
buffer = ex.getText();
ex.close();
} else {
System.out.println("请选择后缀为.doc的文件");
}
} catch (Exception e) {
e.printStackTrace();
}
return buffer;
}
public static void main(String[] args) {
Test tp = new Test();
String content = tp.readWord("D:\number.doc"); //文件存放的地址
System.out.println("content===="+content);
}
}
myeclipse编译时信息
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:128)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.extractor.WordExtractor.(WordExtractor.java:53)
at Test.readWord(Test.java:14)
at Test.main(Test.java:28)
content====
有知道原因的吗?