Lucene - Core
  1. Lucene - Core
  2. LUCENE-2571

Indexing performance tests with realtime branch

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Realtime Branch
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We should run indexing performance tests with the DWPT changes and compare to trunk.

      We need to test both single-threaded and multi-threaded performance.

      NOTE: flush by RAM isn't implemented just yet, so either we wait with the tests or flush by doc count.

        Issue Links

          Activity

          Hide
          Simon Willnauer added a comment -

          benchmarks charts attached

          Show
          Simon Willnauer added a comment - benchmarks charts attached
          Hide
          Simon Willnauer added a comment -

          I run batch indexing benchmarks trunk vs. realtime branch with addDocument and with updateDocument.

          For add document I indexed 10M wikipedia docs into a spinning disk reading from a separate SSD

          Here is the realtime graph:

          vs. trunk:

          This graph shows how DWPT is flushing to disk over time:

          for updateDocument I build a 10M docs wiki index and indexed the exact same documents with updateDocument here are the results:
          Realtime Branch:

          trunk:

          Show
          Simon Willnauer added a comment - I run batch indexing benchmarks trunk vs. realtime branch with addDocument and with updateDocument. For add document I indexed 10M wikipedia docs into a spinning disk reading from a separate SSD Here is the realtime graph: vs. trunk: This graph shows how DWPT is flushing to disk over time: for updateDocument I build a 10M docs wiki index and indexed the exact same documents with updateDocument here are the results: Realtime Branch: trunk:
          Hide
          Simon Willnauer added a comment -

          updated attachements

          Show
          Simon Willnauer added a comment - updated attachements
          Hide
          Lance Norskog added a comment -

          Would you consider trying other MergePolicy objects on trunk? The BalancedSegment MP tries to avoid these long stoppages.

          Show
          Lance Norskog added a comment - Would you consider trying other MergePolicy objects on trunk? The BalancedSegment MP tries to avoid these long stoppages.
          Hide
          Simon Willnauer added a comment -

          Would you consider trying other MergePolicy objects on trunk? The BalancedSegment MP tries to avoid these long stoppages.

          I think there is a misunderstanding on your side. The long stoppages on trunk are not due to merges at all. They are due to flushing the DocumentsWriter which essentially means stop the world. This is why we can not make any progress. Merges are NOT blocking indexing on trunk no matter which MP you use. The Balanced MP is rather suited for RT environments to make reopening the reader quicker.

          you should maybe look at this blog entry for a more complete explanation: http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/

          Show
          Simon Willnauer added a comment - Would you consider trying other MergePolicy objects on trunk? The BalancedSegment MP tries to avoid these long stoppages. I think there is a misunderstanding on your side. The long stoppages on trunk are not due to merges at all. They are due to flushing the DocumentsWriter which essentially means stop the world. This is why we can not make any progress. Merges are NOT blocking indexing on trunk no matter which MP you use. The Balanced MP is rather suited for RT environments to make reopening the reader quicker. you should maybe look at this blog entry for a more complete explanation: http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/
          Hide
          Earwin Burrfoot added a comment -

          Merges are NOT blocking indexing on trunk no matter which MP you use.

          Well.. merges tie up IO (especially if not on fancy SSDs/RAIDs), which in turn lags flushes -> bigger delays for stop the world flushes / lower bandwith cap (after which they are forced to stop the world) for parallel flushes.

          So Lance's point is partially valid.

          Show
          Earwin Burrfoot added a comment - Merges are NOT blocking indexing on trunk no matter which MP you use. Well.. merges tie up IO (especially if not on fancy SSDs/RAIDs), which in turn lags flushes -> bigger delays for stop the world flushes / lower bandwith cap (after which they are forced to stop the world) for parallel flushes. So Lance's point is partially valid.
          Hide
          Simon Willnauer added a comment -

          Well.. merges tie up IO (especially if not on fancy SSDs/RAIDs), which in turn lags flushes -> bigger delays for stop the world flushes / lower bandwith cap (after which they are forced to stop the world) for parallel flushes.

          True it will make a difference in certain situations but not for this benchmark RT does way more merges here since we are flushing way more segments. the time windows I used here is where we almost don't merge at all in the trunk run so it should not make a difference.

          I ran those benchmarks again with BalancedSegmentMergePolicy and it doesn't make any difference really. see below

          Show
          Simon Willnauer added a comment - Well.. merges tie up IO (especially if not on fancy SSDs/RAIDs), which in turn lags flushes -> bigger delays for stop the world flushes / lower bandwith cap (after which they are forced to stop the world) for parallel flushes. True it will make a difference in certain situations but not for this benchmark RT does way more merges here since we are flushing way more segments. the time windows I used here is where we almost don't merge at all in the trunk run so it should not make a difference. I ran those benchmarks again with BalancedSegmentMergePolicy and it doesn't make any difference really. see below

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Michael Busch
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development