Lucene - Core
  1. Lucene - Core
  2. LUCENE-5078

Allow TfIdfSimilarity implementations to encode norm values into more than a single byte

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

        Activity

        Hide
        Shai Erera added a comment -

        Patch changes the signature of encode/decodeNormValue to return/take a long. I also moved the relevant jdocs from TfIdfSim to DefaultSim (re the norms encoding).

        Some tests had to be modified to override the two abstract methods from TfIdfSim.

        Didn't add a CHANGES entry yet, but I will add it under backwards section. Also, I think that queryNorm can be handled separately under LUCENE-1907.

        Show
        Shai Erera added a comment - Patch changes the signature of encode/decodeNormValue to return/take a long. I also moved the relevant jdocs from TfIdfSim to DefaultSim (re the norms encoding). Some tests had to be modified to override the two abstract methods from TfIdfSim. Didn't add a CHANGES entry yet, but I will add it under backwards section. Also, I think that queryNorm can be handled separately under LUCENE-1907 .
        Hide
        Robert Muir added a comment -

        I think this looks pretty good! So the norms "cache" and byte-ness gets moved from TFIDF into DefaultXXX, but otherwise the API is pretty much the same.

        The only concern i have is now DefaultSimilarity has non-final methods with "long" signatures... which could lead to some traps. But maybe this is OK: if someone were to inadvertently override encodeNormValue without changing decodeNormValue, they are doing the wrong thing anyway right?

        Maybe we can try to improve this with javadocs...

        Or alternatively, maybe we should make these final in DefaultSimilarity, and if you want to change norm encoding (not calculation, which can still be easily tuned), really you should extend TFIDFSimilarity since its a pretty big change. I would prefer this.

        Show
        Robert Muir added a comment - I think this looks pretty good! So the norms "cache" and byte-ness gets moved from TFIDF into DefaultXXX, but otherwise the API is pretty much the same. The only concern i have is now DefaultSimilarity has non-final methods with "long" signatures... which could lead to some traps. But maybe this is OK: if someone were to inadvertently override encodeNormValue without changing decodeNormValue, they are doing the wrong thing anyway right? Maybe we can try to improve this with javadocs... Or alternatively, maybe we should make these final in DefaultSimilarity, and if you want to change norm encoding (not calculation, which can still be easily tuned), really you should extend TFIDFSimilarity since its a pretty big change. I would prefer this.
        Hide
        Shai Erera added a comment -

        That's a good point! What about if we make the whole change smaller by keeping the byte-ness and cache in TfIdfSimilarity as a default impl (as it is now), and only change the signature of encode/decode to long. Then, if all you care about is encode the full norm, you can extend DefaultSimilarity and not implement tf, idf and friends?

        I think jdocs is enough. It's quite an expert API, and whoever overrides encode should know better not to override decode ...

        If however we keep them on Default, I think they should be final there.

        Show
        Shai Erera added a comment - That's a good point! What about if we make the whole change smaller by keeping the byte-ness and cache in TfIdfSimilarity as a default impl (as it is now), and only change the signature of encode/decode to long. Then, if all you care about is encode the full norm, you can extend DefaultSimilarity and not implement tf, idf and friends? I think jdocs is enough. It's quite an expert API, and whoever overrides encode should know better not to override decode ... If however we keep them on Default, I think they should be final there.
        Hide
        Robert Muir added a comment -

        That's a good point! What about if we make the whole change smaller by keeping the byte-ness and cache in TfIdfSimilarity as a default impl (as it is now), and only change the signature of encode/decode to long.

        I dunno, i think thats confusing: the whole point of this issue is to move that byteness-implementation out of the abstract class and into the default implementation right?

        If however we keep them on Default, I think they should be final there.

        I think thats all we need to do, make them final on default. otherwise patch is good!

        Show
        Robert Muir added a comment - That's a good point! What about if we make the whole change smaller by keeping the byte-ness and cache in TfIdfSimilarity as a default impl (as it is now), and only change the signature of encode/decode to long. I dunno, i think thats confusing: the whole point of this issue is to move that byteness-implementation out of the abstract class and into the default implementation right? If however we keep them on Default, I think they should be final there. I think thats all we need to do, make them final on default. otherwise patch is good!
        Hide
        Shai Erera added a comment -

        Made the two methods final on DefaultSim. Fixed two tests that overrode them (to extend TfIDFSim directly).

        Committed to trunk and 4x. Thanks Rob!

        Show
        Shai Erera added a comment - Made the two methods final on DefaultSim. Fixed two tests that overrode them (to extend TfIDFSim directly). Committed to trunk and 4x. Thanks Rob!
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development