Lucene - Core
  1. Lucene - Core
  2. LUCENE-4176

Can not produce proper collation key for ICUCollatedTermAttributeImp

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 6.0
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: modules/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      org.apache.lucene.collation.tokenattributes.ICUCollatedTermAttributeImpl return a hash of collation key's byte.
      The given hash value produce incorrect comparison result.
      The source code below return 1 for Lucene 3.6.
      The code here return 0.
      Code to reproduce:

      IndexWriter writer = new IndexWriter(ramDir, conf);
      Document doc = new Document();
      FieldType fieldType = new FieldType();
      fieldType.setIndexed(true);
      fieldType.setStored(true);
      Field field = new Field("content","เข", fieldType);
      doc.add(field);
      writer.addDocument(doc);
      writer.close();
      IndexSearcher is = new IndexSearcher(DirectoryReader.open(ramDir));
      QueryParser qp = new AnalyzingQueryParser(Version.LUCENE_50,"content", analyzer);

      ScoreDoc[] result = is.search(qp.parse("[\u0e01 TO \u0e03]"), null,1000).scoreDocs;
      System.out.println(result.length);

      1. LUCENE-4176.patch
        11 kB
        Robert Muir
      2. LUCENE-4176.patch
        3 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Thanks for reporting this: the bug is actually AnalyzingQueryParser. it should not consume with CharTermAttribute.toString(), instead it should just consume the bytes.

        Show
        Robert Muir added a comment - Thanks for reporting this: the bug is actually AnalyzingQueryParser. it should not consume with CharTermAttribute.toString(), instead it should just consume the bytes.
        Hide
        Robert Muir added a comment -

        untested patch.

        Show
        Robert Muir added a comment - untested patch.
        Hide
        Robert Muir added a comment -

        ok attached is a patch fixing the QP bug with your test.

        There was a bug in your test as well: it doesnt actually analyze the terms because because it doesnt set fieldType.setTokenized(true).

        This is separately a huge trap. I'll open another issue for that.

        Show
        Robert Muir added a comment - ok attached is a patch fixing the QP bug with your test. There was a bug in your test as well: it doesnt actually analyze the terms because because it doesnt set fieldType.setTokenized(true). This is separately a huge trap. I'll open another issue for that.
        Hide
        Robert Muir added a comment -

        Thanks for reporting this: I committed the fix to AnalyzingQueryParser.

        But until LUCENE-4178 is resolved, be sure you setTokenized(true) in your fieldtype!

        Show
        Robert Muir added a comment - Thanks for reporting this: I committed the fix to AnalyzingQueryParser. But until LUCENE-4178 is resolved, be sure you setTokenized(true) in your fieldtype!
        Hide
        Nattapong Sirilappanich added a comment -

        Thanks for the fix and sorry for any confusions.

        Show
        Nattapong Sirilappanich added a comment - Thanks for the fix and sorry for any confusions.
        Hide
        Hoss Man added a comment -

        hoss20120711-manual-post-40alpha-change

        Show
        Hoss Man added a comment - hoss20120711-manual-post-40alpha-change

          People

          • Assignee:
            Unassigned
            Reporter:
            Nattapong Sirilappanich
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development