2 wshxrdlwk wshxrdlwk 于 2014.03.12 15:31 提问

lucene TokenStream.incrementToken() 报错

初学,在网上找了一些例子例如:CSDN移动问答

然后自己在电脑上跑了一下报错,我的代码

public static void main(String[] args) throws IOException
{
    String s = "Good Afternoon Doesn't IS a good body names NAMES 1,671,000 hy body";
    Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_42);
    TokenStream ts =analyzer.tokenStream(s, new StringReader(s));
    CharTermAttribute cab = ts.addAttribute(CharTermAttribute.class);
    ts.incrementToken();
    /*while(ts.incrementToken())
    {
        System.out.println(cab.toString());
    }*/
}

结果报错:Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at java.lang.Character.codePointAtImpl(Unknown Source)
at java.lang.Character.codePointAt(Unknown Source)
at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
at pim.topicmap.FormatConverter.main(FormatConverter.java:69)

就是这句ts.incrmentToken();求解

1个回答

ohyesurright
ohyesurright   2014.08.25 10:18

你看的代码应该是3.5左右的版本;
4之后做了改进,api里和源代码里有说明
The workflow of the new TokenStream API is as follows:

Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
The consumer calls reset().
The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
The consumer calls incrementToken() until it returns false consuming the attributes after each call.
The consumer calls end() so that any end-of-stream operations can be performed.
The consumer calls close() to release any resource when finished using the TokenStream.
To make sure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in incrementToken().

1.while 之前reset()一下
2.while 之后end()一下
3.然后关闭流

Csdn user default icon
上传中...
上传图片
插入图片
准确详细的回答,更有利于被提问者采纳,从而获得C币。复制、灌水、广告等回答会被删除,是时候展现真正的技术了!