Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7049

merge eats CPU much when there are many deleteByQuery

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/index, core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When

      adding very many delete><query>

      Then

      we got CPU spike in merge thread that blocks indexing process

      Considerations

      Despite adding too many <delete><query> is odd itself, I suppose the code can more efficient. See sampling snapshots attached.

      1. Selection_114.png
        25 kB
        Mikhail Khludnev
      2. Selection_115.png
        18 kB
        Mikhail Khludnev
      3. Selection_116.png
        51 kB
        Mikhail Khludnev

        Issue Links

          Activity

          Hide
          elyograg Shawn Heisey added a comment -

          I have noticed that deleteByQuery in Solr will be held up by a merge. That roadblock probably also stands in the way of any further indexing requests made after the deleteByQuery. I would not be overly surprised to learn that it makes the CPU spin, but I have not actually checked this.

          Adds and deletes by ID can happen at the same time as a merge since 4.0, but deleteByQuery apparently works differently, so it is blocked by any merge activity. I recently changed my indexing program so that it turns deleteByQuery into a query with fl=unique_key_field, and then does the ID delete with the results. I had gotten rid of all the "optimizeUnderway" checking that I had added back in the 3.x days, and didn't want to add that back in, so I "fixed" my delete code.

          Show
          elyograg Shawn Heisey added a comment - I have noticed that deleteByQuery in Solr will be held up by a merge. That roadblock probably also stands in the way of any further indexing requests made after the deleteByQuery. I would not be overly surprised to learn that it makes the CPU spin, but I have not actually checked this. Adds and deletes by ID can happen at the same time as a merge since 4.0, but deleteByQuery apparently works differently, so it is blocked by any merge activity. I recently changed my indexing program so that it turns deleteByQuery into a query with fl=unique_key_field, and then does the ID delete with the results. I had gotten rid of all the "optimizeUnderway" checking that I had added back in the 3.x days, and didn't want to add that back in, so I "fixed" my delete code.
          Hide
          mkhludnev Mikhail Khludnev added a comment -

          Thanks Ivan Mamontov for contributing snapshots.

          Show
          mkhludnev Mikhail Khludnev added a comment - Thanks Ivan Mamontov for contributing snapshots.

            People

            • Assignee:
              Unassigned
              Reporter:
              mkhludnev Mikhail Khludnev
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development