Lucene.Net
  1. Lucene.Net
  2. LUCENENET-350

Performance enhancement in FastVectorHighlighter

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I've had some performace issues with highlighting large documents (>25MB plain text, > 11000 Terms per Field)

      This can be usual if your indexing i.e. log or trace files.
      Most of the time is spent loading the field value and stored TermVectors and Offets and itering over this List.
      I've build a TermVectorMapper which filters this List, by the searched terms, so that the time is reduces by approx. 30%

      1. LUCENENET-350.patch
        9 kB
        Digy
      2. FieldTermStack.patch
        2 kB
        Bianco Veigel
      3. VectorHighlightMapper.cs
        4 kB
        Bianco Veigel

        Activity

        Hide
        Scott Lombard added a comment -

        Bulk Close for all issues before incubation

        Show
        Scott Lombard added a comment - Bulk Close for all issues before incubation
        Hide
        Digy added a comment -

        Thanks Bianco.
        I committed the LUCENENET-350.patch

        DIGY

        Show
        Digy added a comment - Thanks Bianco. I committed the LUCENENET-350 .patch DIGY
        Hide
        Digy added a comment -

        Thanks Bianco.

        If LUCENENET-350.patch is OK for everyone, I will commit & close this issue.

        DIGY

        Show
        Digy added a comment - Thanks Bianco. If LUCENENET-350 .patch is OK for everyone, I will commit & close this issue. DIGY
        Hide
        Bianco Veigel added a comment -

        New Patch file with all changes in one.

        Show
        Bianco Veigel added a comment - New Patch file with all changes in one.
        Hide
        Ben Martz added a comment -

        Agreed. I was just concerned about the code divergence considering the core guideline of remaining 1:1 with the original Java code while considering this contribution as a valuable (and unique to Lucene.Net) branch of an existing contrib item. I'm at a loss for a better idea though right now.

        Show
        Ben Martz added a comment - Agreed. I was just concerned about the code divergence considering the core guideline of remaining 1:1 with the original Java code while considering this contribution as a valuable (and unique to Lucene.Net) branch of an existing contrib item. I'm at a loss for a better idea though right now.
        Hide
        Digy added a comment -

        FasterVectorHighlighter perhaps?

        A new project?
        I am afraid of getting new projects MoreFasterVectorHighlighter , MuchMoreFasterVectorHighlighter etc.

        DIGY

        Show
        Digy added a comment - FasterVectorHighlighter perhaps? A new project? I am afraid of getting new projects MoreFasterVectorHighlighter , MuchMoreFasterVectorHighlighter etc. DIGY
        Hide
        Digy added a comment -

        Hi Bianco,
        I have troubles with applying the new patch. Can you provide a new clean patch for FieldTermStack?

        DIGY

        Show
        Digy added a comment - Hi Bianco, I have troubles with applying the new patch. Can you provide a new clean patch for FieldTermStack? DIGY
        Hide
        Bianco Veigel added a comment -

        Removes a possible NUllReferenceException in FieldTermStack.cs

        Show
        Bianco Veigel added a comment - Removes a possible NUllReferenceException in FieldTermStack.cs
        Hide
        Ben Martz added a comment -

        This is a valuable contribution given the performance increase, thank you Bianco!

        FasterVectorHighlighter perhaps?

        Show
        Ben Martz added a comment - This is a valuable contribution given the performance increase, thank you Bianco! FasterVectorHighlighter perhaps?
        Hide
        Digy added a comment -

        Hi Bianco,
        First of all, could you add a apache licence to the file VectorHighlightMapper.cs?

        Your work is very good and pass all tests, but it is not like just a simple bug fix and there is a divergence from FVH java. This makes life hard while making new versions' ports.

        All Lucene.Net community!
        Any idea about what we should do?

        DIGY

        Show
        Digy added a comment - Hi Bianco, First of all, could you add a apache licence to the file VectorHighlightMapper.cs? Your work is very good and pass all tests, but it is not like just a simple bug fix and there is a divergence from FVH java. This makes life hard while making new versions' ports. All Lucene.Net community! Any idea about what we should do? DIGY

          People

          • Assignee:
            Unassigned
            Reporter:
            Bianco Veigel
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development