Solr
  1. Solr
  2. SOLR-3142

remove O(n^2) slow slow indexing defaults in DataImportHandler

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      By default, dataimporthandler optimizes the entire index when it commits.

      This is bad for performance, because it means by default its doing a very
      heavy index-wide operation even for an incremental update... essentially
      O(n^2) indexing.

      All that is needed is to set optimize=false by default. If someone wants
      to optimize, they can either set optimize=true or explicitly optimize themselves.

      1. SOLR-3142.patch
        1.0 kB
        Robert Muir

        Activity

        Hide
        Yonik Seeley added a comment -

        +1

        Might even make sense for it to be a "soft" commit.

        Show
        Yonik Seeley added a comment - +1 Might even make sense for it to be a "soft" commit.
        Hide
        Robert Muir added a comment -

        patch for the optimize.

        I agree about the soft commit, if not even the default it should at least be allowable/configurable... but I just didn't implement this in the patch.

        In general whatever options are available for commit should be consistent with what DIH allows, maybe we should open a separate issue to ensure this is the case.

        Show
        Robert Muir added a comment - patch for the optimize. I agree about the soft commit, if not even the default it should at least be allowable/configurable... but I just didn't implement this in the patch. In general whatever options are available for commit should be consistent with what DIH allows, maybe we should open a separate issue to ensure this is the case.
        Hide
        Uwe Schindler added a comment -

        +1, are there any config files/parsing to edit? I somwhere have in my mind, that in DIH config there are also settings regading optimize?

        Show
        Uwe Schindler added a comment - +1, are there any config files/parsing to edit? I somwhere have in my mind, that in DIH config there are also settings regading optimize?
        Hide
        Robert Muir added a comment -

        I think it might be possible to configure this via files (versus the actual command), but i
        searched for 'optimize' in the example-dih and found nothing

        Show
        Robert Muir added a comment - I think it might be possible to configure this via files (versus the actual command), but i searched for 'optimize' in the example-dih and found nothing
        Hide
        Robert Muir added a comment -

        Unless there are objections I'd like to commit this to
        make some progress.

        Show
        Robert Muir added a comment - Unless there are objections I'd like to commit this to make some progress.
        Hide
        Hoss Man added a comment -

        FWIW: I'm pretty sure the original assumption here was that in the (relatively common) usecase of doing a full-import rebuild on a regular basis (ie: nightly) that it can be handy to have it auto-optimize when you are done. I think the real problem is that that assumption was never challeneged regarding things like delta import.

        so an argument could be made the the default should still be to optimze=true on full-import, and optimize=false on delta import ... but i'm not going to make that argument, i think this it's silly to assume true in either case. (particularly since a parameterized full import might actually be a rapidly repeating incremental)

        Show
        Hoss Man added a comment - FWIW: I'm pretty sure the original assumption here was that in the (relatively common) usecase of doing a full-import rebuild on a regular basis (ie: nightly) that it can be handy to have it auto-optimize when you are done. I think the real problem is that that assumption was never challeneged regarding things like delta import. so an argument could be made the the default should still be to optimze=true on full-import, and optimize=false on delta import ... but i'm not going to make that argument, i think this it's silly to assume true in either case. (particularly since a parameterized full import might actually be a rapidly repeating incremental)
        Hide
        Robert Muir added a comment -

        (ie: nightly) that it can be handy to have it auto-optimize when you are done.

        You can still do this, by specifying 'optimize=true' to your full-import.
        Its just no longer the default. So we haven't taken away any capabilities here.

        Show
        Robert Muir added a comment - (ie: nightly) that it can be handy to have it auto-optimize when you are done. You can still do this, by specifying 'optimize=true' to your full-import. Its just no longer the default. So we haven't taken away any capabilities here.
        Hide
        Uwe Schindler added a comment -

        any optimizing after a full import over a non-empty index is no longer really needed in Lucene (even if you do a IndexWriter.deleteAll() before as the fullimport does). Once IndexWriter merges (or on close or commit) and detects a segment only contains of deleted documents it will drop it. This was indeed not true in the past, but since Lucene 3.1 or like that it is.

        Show
        Uwe Schindler added a comment - any optimizing after a full import over a non-empty index is no longer really needed in Lucene (even if you do a IndexWriter.deleteAll() before as the fullimport does). Once IndexWriter merges (or on close or commit) and detects a segment only contains of deleted documents it will drop it. This was indeed not true in the past, but since Lucene 3.1 or like that it is.
        Hide
        Hoss Man added a comment -

        agreed, was just noting why i think the original default was true..

        Show
        Hoss Man added a comment - agreed, was just noting why i think the original default was true..

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development