Lucene - Core
  1. Lucene - Core
  2. LUCENE-5398

NormValueSource unable to read long field norm

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 4.6
    • Fix Version/s: 4.7, 6.0
    • Component/s: core/query/scoring
    • Labels:
      None
    • Environment:

      Ubuntu 12.04

    • Lucene Fields:
      New

      Description

      Previous Lucene implementation store field norms of all documents in memory, float values are therefore encoded into byte to minimize memory consumption.
      Recent release no longer have this constraint (see LUCENE-5078, and discussion at http://lucene.markmail.org/message/jtwit3pwu5oiqr2h), users are encouraged to implement their own encodeNormValue() to encode them into/decode from any type including int, byte and long, to fulfil their request for precision.
      But the legacy NormValueSource still typecast any long encoding into byte, as seen in line 74 in the java file, making any TFIDFSimilarity using more accurate encoding useless.
      It should be removed for the greater good.

      1. LUCENE-5398.patch
        10 kB
        Michael McCandless
      2. NormValueSource.java
        3 kB
        Peng Cheng
      3. TestValueSourcesWithNonByteNormEncoding.java
        9 kB
        Peng Cheng

        Activity

        Hide
        Peng Cheng added a comment -

        Removed.

        Run junit TestValueSources without problem. This thing should be trivial and doesn't require a test case for non-byte situation

        Show
        Peng Cheng added a comment - Removed. Run junit TestValueSources without problem. This thing should be trivial and doesn't require a test case for non-byte situation
        Hide
        Peng Cheng added a comment -

        I have attached a simple test case to show the issue. The only difference:
        Similarity at index-time and search-time are replaced with a TFIDFSimilarity implementation that use integer norm encoding (instead of byte).

        Show
        Peng Cheng added a comment - I have attached a simple test case to show the issue. The only difference: Similarity at index-time and search-time are replaced with a TFIDFSimilarity implementation that use integer norm encoding (instead of byte).
        Hide
        Michael McCandless added a comment -

        Thanks Peng, I'll have a look. It's clear that cast to (byte) is a holdover from before TFIDFSim only accepted 1 byte norms.

        Show
        Michael McCandless added a comment - Thanks Peng, I'll have a look. It's clear that cast to (byte) is a holdover from before TFIDFSim only accepted 1 byte norms.
        Hide
        Michael McCandless added a comment -

        Thanks Peng; I simplified the test a bit and folded the fix into the attached patch.

        I think it's ready!

        Show
        Michael McCandless added a comment - Thanks Peng; I simplified the test a bit and folded the fix into the attached patch. I think it's ready!
        Hide
        Peng Cheng added a comment -

        At you service, I've read your book.

        Show
        Peng Cheng added a comment - At you service, I've read your book.
        Hide
        ASF subversion and git services added a comment -

        Commit 1563119 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1563119 ]

        LUCENE-5398: remove invalid byte cast in NormValueSource, since TFIDFSimilarity now allows for norms larger than one byte

        Show
        ASF subversion and git services added a comment - Commit 1563119 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1563119 ] LUCENE-5398 : remove invalid byte cast in NormValueSource, since TFIDFSimilarity now allows for norms larger than one byte
        Hide
        ASF subversion and git services added a comment -

        Commit 1563120 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1563120 ]

        LUCENE-5398: remove invalid byte cast in NormValueSource, since TFIDFSimilarity now allows for norms larger than one byte

        Show
        ASF subversion and git services added a comment - Commit 1563120 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1563120 ] LUCENE-5398 : remove invalid byte cast in NormValueSource, since TFIDFSimilarity now allows for norms larger than one byte
        Hide
        Michael McCandless added a comment -

        Thanks Peng!

        Show
        Michael McCandless added a comment - Thanks Peng!
        Hide
        ASF subversion and git services added a comment -

        Commit 1563209 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1563209 ]

        LUCENE-5398: don't use 3.x codec in this test

        Show
        ASF subversion and git services added a comment - Commit 1563209 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1563209 ] LUCENE-5398 : don't use 3.x codec in this test

          People

          • Assignee:
            Unassigned
            Reporter:
            Peng Cheng
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development