Cocoon
  1. Cocoon
  2. COCOON-2065

huge performance increase of LuceneIndexTransformer on large Lucene indexes

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.6, 2.1.7, 2.1.8, 2.1.9, 2.1.10, 2.1.11, 2.2
    • Fix Version/s: 2.1.12, 2.2
    • Component/s: Blocks: Lucene
    • Labels:
      None
    • Urgency:
      Normal
    • Other Info:
      Patch available

      Description

      PROBLEM:

      The LuceneIndexTransformer optimizes the Lucene index every time you add an entry to the index.
      This slows down enormously the indexing with a large index ! If upon every checkin of a document eg,
      you use it to update the entry, it will slow down.

      Eg. I have a Pentium IV 2.4 Ghz, Lucene index contains 10 000 doc.
      Where the index update only takes say 60ms, the optimize that get's called, can take 7 seconds!


      SOLUTION:

      I've created a patch that introduces an option "optimize-frequency" to determine the frequency of the optimize call.
      It defaults to 1 (current behaviour), when a user sets it to 50, only once every 50 updates the index will be optimized etc....
      If no optimization is wanted, you can set it to 0.

      This is compliant to the Lucene documentation (fragment of Lucene FAQ):

      "The IndexWriter class supports an optimize() method that compacts the index database and speedup queries. You may want to use this method after performing a complete indexing of your document set or after incremental updates of the index. If your incremental update adds documents frequently, you want to perform the optimization only once in a while to avoid the extra overhead of the optimization."

      PATCH INFO:


      added configuration option + a function "needToOptimize()" which is called before optimizing.
      needToOptimize() uses a random function generator, to keep code simple.

      - when the option is not set, CODE WILL BE EXECUTED AS BEFORE
      - tested one 2.1.11 SVN branch, but no differences in the "main" trunk thus can be applied there also.
      - Updated API docs
      - if patch accepted, I will also update the Wiki:

      http://wiki.apache.org/cocoon/LuceneIndexTransformer

        Activity

        Hide
        Dominique De Munck added a comment -
        PATCH INFO (see also bug):


        added configuration option + a function "needToOptimize()" which is called before optimizing.
        needToOptimize() uses a random function generator, to keep code simple.

        - when the option is not set, CODE WILL BE EXECUTED AS BEFORE
        - tested one 2.1.11 SVN branch, but no differences in the "main" trunk thus can be applied there also.
        - Updated API docs
        - if patch accepted, I will also update the Wiki:

        http://wiki.apache.org/cocoon/LuceneIndexTransformer
        Show
        Dominique De Munck added a comment - PATCH INFO (see also bug): added configuration option + a function "needToOptimize()" which is called before optimizing. needToOptimize() uses a random function generator, to keep code simple. - when the option is not set, CODE WILL BE EXECUTED AS BEFORE - tested one 2.1.11 SVN branch, but no differences in the "main" trunk thus can be applied there also. - Updated API docs - if patch accepted, I will also update the Wiki: http://wiki.apache.org/cocoon/LuceneIndexTransformer
        Hide
        Felix Knecht added a comment -
        Patch applied to C2.2_dev
        Show
        Felix Knecht added a comment - Patch applied to C2.2_dev
        Hide
        Grzegorz Kossakowski added a comment -
        Thanks Dominique for posting a patch.

        As you already offered a help with updating documentation, would you like to move the page from wiki to our official documentation repository that is located at http://cocoon.zones.apache.org/daisy/? It's preferable to have that info in official docs.

        Documents from Daisy will be published at official, reworked site soon.
        Show
        Grzegorz Kossakowski added a comment - Thanks Dominique for posting a patch. As you already offered a help with updating documentation, would you like to move the page from wiki to our official documentation repository that is located at http://cocoon.zones.apache.org/daisy/? It's preferable to have that info in official docs. Documents from Daisy will be published at official, reworked site soon.
        Hide
        Felix Knecht added a comment -
        Due to a lack of knowledge I haven't close the bug after fixing the issues in the last open active branch.
        Show
        Felix Knecht added a comment - Due to a lack of knowledge I haven't close the bug after fixing the issues in the last open active branch.
        Hide
        Antonio Gallardo added a comment -
        Patch was not applied in cocon 2.1.11-dev.
        Show
        Antonio Gallardo added a comment - Patch was not applied in cocon 2.1.11-dev.
        Hide
        Felix Knecht added a comment -
        Remove 2.1.11-dev from fixed versions, this tag seemed to be set mistakenly.
        Show
        Felix Knecht added a comment - Remove 2.1.11-dev from fixed versions, this tag seemed to be set mistakenly.
        Hide
        Alfred Nathaniel added a comment -
        I applied the patch now also to 2.1.12-dev.
        Show
        Alfred Nathaniel added a comment - I applied the patch now also to 2.1.12-dev.
        Hide
        Alfred Nathaniel added a comment -
        Please check 2.1.12-dev and reopen issue in case there is a problem.
        Thanks again for providing the patch.
        Show
        Alfred Nathaniel added a comment - Please check 2.1.12-dev and reopen issue in case there is a problem. Thanks again for providing the patch.

          People

          • Assignee:
            Alfred Nathaniel
            Reporter:
            Dominique De Munck
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development