Solr
  1. Solr
  2. SOLR-2200

DIH DocBuilder - Improve perf. on large delta deletes

    Details

      Description

      In collectDelta, the procedure that collects the PKs for the documents that should be updated or deleted for an entity, iterates over the entire deltaSet for every deleted document. This is very expensive when you are updating and deleting millions of documents in one delta-import.
      Considering that the comparison between deleted and delta is on the PK, lets build the deltaSet as a HashMap instead of a HashSet to enable quick key lookups and remove the need for repeated iterations.

        Issue Links

          Activity

          Shalin Shekhar Mangar made changes -
          Link This issue is duplicated by SOLR-1927 [ SOLR-1927 ]
          Hide
          Robert Muir added a comment -

          click the subversion commits tab.

          Show
          Robert Muir added a comment - click the subversion commits tab.
          Hide
          Mark Waddle added a comment -

          Hi Robert,

          I apologize for my ignorance, but why can't I see these changes in the current dev/trunk? Am I looking in the wrong place?

          Mark

          Show
          Mark Waddle added a comment - Hi Robert, I apologize for my ignorance, but why can't I see these changes in the current dev/trunk? Am I looking in the wrong place? Mark
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Robert Muir made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Robert Muir added a comment -

          Committed revisions 1029325 (trunk), 1029328 (3x).

          Thanks Mark!

          Show
          Robert Muir added a comment - Committed revisions 1029325 (trunk), 1029328 (3x). Thanks Mark!
          rmuir committed 1029328 (95 files)
          Reviews: none

          SOLR-2200: improve DIH perf for large delta-import updates

          Lucene branch_3x
          rmuir committed 1029325 (2 files)
          Robert Muir made changes -
          Assignee Robert Muir [ rcmuir ]
          Fix Version/s 3.1 [ 12314371 ]
          Fix Version/s 4.0 [ 12314992 ]
          Hide
          Robert Muir added a comment -

          Mark, thanks for your contribution.

          Seems like a no-brainer to me, and all tests pass with the patch.

          I'd like to commit this unless anyone has objections.

          Show
          Robert Muir added a comment - Mark, thanks for your contribution. Seems like a no-brainer to me, and all tests pass with the patch. I'd like to commit this unless anyone has objections.
          Mark Waddle made changes -
          Field Original Value New Value
          Attachment SOLR-2200.patch [ 12458080 ]
          Hide
          Mark Waddle added a comment -

          Uploading patch to improve performance for delta-imports with a significant number of deletions.

          Show
          Mark Waddle added a comment - Uploading patch to improve performance for delta-imports with a significant number of deletions.
          Mark Waddle created issue -

            People

            • Assignee:
              Robert Muir
              Reporter:
              Mark Waddle
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development