今天写了一段小程序,结果在有的网页上能够解析,有的网页上不能解析,不能解析时,出错在下方的第二行
TagNameFilter tablefilter=new TagNameFilter("div");
NodeList nodelist = parser.extractAllNodesThatMatch(tablefilter);
错误提示是:
Exception in thread "main" org.htmlparser.util.ParserException: problem reading a character at position 36130;
java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(Unknown Source)
at java.util.zip.GZIPInputStream.readUShort(Unknown Source)
at java.util.zip.GZIPInputStream.readUInt(Unknown Source)
at java.util.zip.GZIPInputStream.readTrailer(Unknown Source)
at java.util.zip.GZIPInputStream.read(Unknown Source)
at org.htmlparser.lexer.Stream.fill(Stream.java:177)
at org.htmlparser.lexer.Stream.read(Stream.java:266)
at java.io.InputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at org.htmlparser.lexer.InputStreamSource.fill(InputStreamSource.java:345)
at org.htmlparser.lexer.InputStreamSource.read(InputStreamSource.java:395)
at org.htmlparser.lexer.Page.getCharacter(Page.java:704)
at org.htmlparser.lexer.Lexer.parseString(Lexer.java:737)
at org.htmlparser.lexer.Lexer.nextNode(Lexer.java:400)
at org.htmlparser.lexer.Lexer.nextNode(Lexer.java:317)
at org.htmlparser.util.IteratorImpl.nextNode(IteratorImpl.java:77)
at org.htmlparser.Parser.parse(Parser.java:700)
at Get_houqu.getInfor(Get_houqu.java:54)
at Get_houqu.getsite(Get_houqu.java:28)
at Get_houqu.main(Get_houqu.java:60)
这种问题怎么解决啊,我是个新手,不是很明白网上的一些讲解,感觉写的不明白怎么修改,有没有知道怎么修改的大神啊,望指点一二,谢谢。