Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/search
    • Labels:
    • Lucene Fields:
      New, Patch Available

      Description

      The DFR normalizations H1 and H2 are parameter-free. This is in line with the original article, but not with the thesis, where H2 accepts a c parameter, nor with information-based models, where H1 also accepts a c parameter.

      1. LUCENE-3566.patch
        9 kB
        Robert Muir
      2. LUCENE-3566.patch
        3 kB
        David Mark Nemeskey
      3. LUCENE-3566.patch
        3 kB
        David Mark Nemeskey

        Activity

        Hide
        David Mark Nemeskey added a comment -

        Patch.

        Show
        David Mark Nemeskey added a comment - Patch.
        Hide
        Robert Muir added a comment -

        +1, lets add these.

        i didnt think H1 took params (the thesis says 'Therefore, the constant of C is 1 assuming H1', then defines it without C). did the IB paper make a mistake?

        either way, it wont hurt anything to add the parameter, just confusing

        Show
        Robert Muir added a comment - +1, lets add these. i didnt think H1 took params (the thesis says 'Therefore, the constant of C is 1 assuming H1', then defines it without C). did the IB paper make a mistake? either way, it wont hurt anything to add the parameter, just confusing
        Hide
        Robert Muir added a comment -

        editing fix version to 4.0, since flexscoring branch was merged, i think we can safely do any scoring improvements in mainline trunk

        Show
        Robert Muir added a comment - editing fix version to 4.0, since flexscoring branch was merged, i think we can safely do any scoring improvements in mainline trunk
        Hide
        David Mark Nemeskey added a comment -

        i didnt think H1 took params (the thesis says 'Therefore, the constant of C is 1 assuming H1', then defines it without C). did the IB paper make a mistake?

        Good question. Perhaps it was a mistake; however, according to my colleague, who had experimented with the IB method in our own engine and proposed to add the parameter to Lucene, a well chosen c can improve the results. Well, duh really; nevertheless, as long as we have defaults, shouldn't be a problem.

        Show
        David Mark Nemeskey added a comment - i didnt think H1 took params (the thesis says 'Therefore, the constant of C is 1 assuming H1', then defines it without C). did the IB paper make a mistake? Good question. Perhaps it was a mistake; however, according to my colleague, who had experimented with the IB method in our own engine and proposed to add the parameter to Lucene, a well chosen c can improve the results. Well, duh really; nevertheless, as long as we have defaults, shouldn't be a problem.
        Hide
        Robert Muir added a comment -

        Yeah I agree... maybe in the patch we can expose the parameter to the factory in solr (DFRSimilarityFactory has a param-parsing method for Normalization reused by IB, too) ?

        Show
        Robert Muir added a comment - Yeah I agree... maybe in the patch we can expose the parameter to the factory in solr (DFRSimilarityFactory has a param-parsing method for Normalization reused by IB, too) ?
        Hide
        David Mark Nemeskey added a comment -

        Patch re-based on trunk.

        Show
        David Mark Nemeskey added a comment - Patch re-based on trunk.
        Hide
        Robert Muir added a comment -

        I thought we had done this already: but realized I forgot about it!

        I added the solr factory/parsing stuff to the patch. Will commit shortly.

        Show
        Robert Muir added a comment - I thought we had done this already: but realized I forgot about it! I added the solr factory/parsing stuff to the patch. Will commit shortly.
        Hide
        Robert Muir added a comment -

        Thanks David!

        Show
        Robert Muir added a comment - Thanks David!

          People

          • Assignee:
            David Mark Nemeskey
            Reporter:
            David Mark Nemeskey
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development