Lucene - Core
  1. Lucene - Core
  2. LUCENE-3026

smartcn analyzer throw NullPointer exception when the length of analysed text over 32767

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1, 4.0-ALPHA
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method:
      public List<SegToken> makeIndex() {
      List<SegToken> result = new ArrayList<SegToken>();
      int s = -1, count = 0, size = tokenListTable.size();
      List<SegToken> tokenList;
      short index = 0;
      while (count < size) {
      if (isStartExist(s)) {
      tokenList = tokenListTable.get(s);
      for (SegToken st : tokenList)

      { st.index = index; result.add(st); index++; }

      count++;
      }
      s++;
      }
      return result;
      }

      here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

      1. LUCENE-3026.patch
        0.7 kB
        wangzhenghang

        Activity

        Robert Muir made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Shai Erera made changes -
        Component/s modules/analysis [ 12310230 ]
        Component/s contrib/analyzers [ 12312333 ]
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Robert Muir made changes -
        Fix Version/s 3.2 [ 12316070 ]
        Fix Version/s 4.0 [ 12314025 ]
        Robert Muir made changes -
        Assignee Robert Muir [ rcmuir ]
        wangzhenghang made changes -
        Attachment LUCENE-3026.patch [ 12476296 ]
        wangzhenghang made changes -
        Summary smartcn analysis throw NullPointer exception when the length of analysed text over 32767 smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
        wangzhenghang made changes -
        Field Original Value New Value
        Description That all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method:
          public List<SegToken> makeIndex() {
            List<SegToken> result = new ArrayList<SegToken>();
            int s = -1, count = 0, size = tokenListTable.size();
            List<SegToken> tokenList;
            short index = 0;
            while (count < size) {
              if (isStartExist(s)) {
                tokenList = tokenListTable.get(s);
                for (SegToken st : tokenList) {
                  st.index = index;
                  result.add(st);
                  index++;
                }
                count++;
              }
              s++;
            }
            return result;
          }

        'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2, http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
        That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method:
          public List<SegToken> makeIndex() {
            List<SegToken> result = new ArrayList<SegToken>();
            int s = -1, count = 0, size = tokenListTable.size();
            List<SegToken> tokenList;
            short index = 0;
            while (count < size) {
              if (isStartExist(s)) {
                tokenList = tokenListTable.get(s);
                for (SegToken st : tokenList) {
                  st.index = index;
                  result.add(st);
                  index++;
                }
                count++;
              }
              s++;
            }
            return result;
          }

        here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java
        wangzhenghang created issue -

          People

          • Assignee:
            Robert Muir
            Reporter:
            wangzhenghang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development