Lucene - Core
  1. Lucene - Core
  2. LUCENE-1690

Morelikethis queries are very slow compared to other search types

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.4.1
    • Fix Version/s: None
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The MoreLikeThis object performs term frequency lookups for every query. From my testing that's what seems to take up the majority of time for MoreLikeThis searches.

      For some (I'd venture many) applications it's not necessary for term statistics to be looked up every time. A fairly naive opt-in caching mechanism tied to the life of the MoreLikeThis object would allow applications to cache term statistics for the duration that suits them.

      I've got this working in my test code. I'll put together a patch file when I get a minute. From my testing this can improve performance by a factor of around 10.

      1. LUCENE-1690.patch
        13 kB
        Richard Marr
      2. LruCache.patch
        5 kB
        Richard Marr
      3. LUCENE-1690.patch
        3 kB
        Richard Marr

        Activity

        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12562940 ] jira [ 12583819 ]
        Mark Thomas made changes -
        Workflow jira [ 12465802 ] Default workflow, editable Closed status [ 12562940 ]
        Richard Marr made changes -
        Attachment LUCENE-1690.patch [ 12415006 ]
        Richard Marr made changes -
        Attachment LruCache.patch [ 12414748 ]
        Richard Marr made changes -
        Field Original Value New Value
        Attachment LUCENE-1690.patch [ 12410534 ]
        Richard Marr created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Richard Marr
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development