Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5730

make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Example configuration (solrconfig.xml) :

      -<mergePolicy class="TieredMergePolicy"/>
      +<mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
      +  <str name="wrapped.prefix">in</str>
      +  <str name="in.class">org.apache.solr.index.TieredMergePolicyFactory</str>
      +  <str name="sort">timestamp desc</str>
      +</mergePolicyFactory>
      

      Example use (EarlyTerminatingSortingCollector):

      &sort=timestamp+desc&segmentTerminateEarly=true
      
      1. SOLR-5730-part1and2.patch
        42 kB
        Christine Poerschke
      2. SOLR-5730-part1and2.patch
        44 kB
        Christine Poerschke
      3. SOLR-5730-part1of2.patch
        15 kB
        Christine Poerschke
      4. SOLR-5730-part1of2.patch
        53 kB
        Christine Poerschke
      5. SOLR-5730-part2of2.patch
        36 kB
        Christine Poerschke
      6. SOLR-5730-part2of2.patch
        28 kB
        Christine Poerschke

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user cpoerschke opened a pull request:

          https://github.com/apache/lucene-solr/pull/31

          make SortingMergePolicy and EarlyTerminatingSortingCollector configurable

          For https://issues.apache.org/jira/i#browse/SOLR-5730 ticket.

          Also for the first 2 of the 5 ideas on https://issues.apache.org/jira/i#browse/SOLR-4654 ticket.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/lucene-solr branch_4x-etsc-solr

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/31.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #31


          commit d45f8126ed8985ab085b989eed176eddddd9a694
          Author: Christine Poerschke <cpoerschke@bloomberg.net>
          Date: 2014-02-14T13:58:38Z

          make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/31 make SortingMergePolicy and EarlyTerminatingSortingCollector configurable For https://issues.apache.org/jira/i#browse/SOLR-5730 ticket. Also for the first 2 of the 5 ideas on https://issues.apache.org/jira/i#browse/SOLR-4654 ticket. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/lucene-solr branch_4x-etsc-solr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/31.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #31 commit d45f8126ed8985ab085b989eed176eddddd9a694 Author: Christine Poerschke <cpoerschke@bloomberg.net> Date: 2014-02-14T13:58:38Z make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr
          Hide
          cpoerschke Christine Poerschke added a comment -

          https://issues.apache.org/jira/i#browse/LUCENE-5445 and associated github pull request are related here (ASF GitHub Bot seemingly only updated the first of the tickets mentioned in the pull request comments).

          Show
          cpoerschke Christine Poerschke added a comment - https://issues.apache.org/jira/i#browse/LUCENE-5445 and associated github pull request are related here (ASF GitHub Bot seemingly only updated the first of the tickets mentioned in the pull request comments).
          Hide
          elyograg Shawn Heisey added a comment -

          Christine Poerschke, are you in a position to build a test that makes sure everything does what it's expected to do, ideally when the new feature is disabled as well as enabled? I'm evaluating whether or not I can take this issue on. My understanding of how to use Lucene is pretty low, and I'm a little wobbly on crafting new tests.

          Show
          elyograg Shawn Heisey added a comment - Christine Poerschke , are you in a position to build a test that makes sure everything does what it's expected to do, ideally when the new feature is disabled as well as enabled? I'm evaluating whether or not I can take this issue on. My understanding of how to use Lucene is pretty low, and I'm a little wobbly on crafting new tests.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Shawn, yes i'm having a go at building tests for this. So far a new TestMergePolicyConfigWithMergeSorter based on the existing TestMergePolicyConfig, and tests for the IndexSchema and searching changes would be on the todo list too.

          Show
          cpoerschke Christine Poerschke added a comment - Hi Shawn, yes i'm having a go at building tests for this. So far a new TestMergePolicyConfigWithMergeSorter based on the existing TestMergePolicyConfig, and tests for the IndexSchema and searching changes would be on the todo list too.
          Hide
          rcmuir Robert Muir added a comment -

          Hello, some things that might simplify some of the TODOs, is we changed the SortingMergePolicy API in LUCENE-5493 to just take Sort.

          This means you can have multiple fields, they dont have to be numeric docvalues, and so on. So I think this can simplify the configuration of this thing too, e.g. you could just take a standard "sort spec string" and parse it with QueryParsing.getSort or whatever (some refactoring might be needed here).

          It would be good though, to check that Sort.needsScores() == false, as that makes no sense at index-time... I'll open an issue to add this check to SortingMergePolicy itself in lucene.

          The other difference is, EarlyTerminatingSortingCollector now also takes a Sort, except really you should just pass the Sort being used for the Query (it does the proper checking against the segments to see if the segment was sorted in a compatible way, and if so, will optimize with early termination). Today this just checks that they are exactly equal, but in the future it can be smarter (LUCENE-5499).

          Hopefully this makes the integration easier.

          Show
          rcmuir Robert Muir added a comment - Hello, some things that might simplify some of the TODOs, is we changed the SortingMergePolicy API in LUCENE-5493 to just take Sort. This means you can have multiple fields, they dont have to be numeric docvalues, and so on. So I think this can simplify the configuration of this thing too, e.g. you could just take a standard "sort spec string" and parse it with QueryParsing.getSort or whatever (some refactoring might be needed here). It would be good though, to check that Sort.needsScores() == false, as that makes no sense at index-time... I'll open an issue to add this check to SortingMergePolicy itself in lucene. The other difference is, EarlyTerminatingSortingCollector now also takes a Sort, except really you should just pass the Sort being used for the Query (it does the proper checking against the segments to see if the segment was sorted in a compatible way, and if so, will optimize with early termination). Today this just checks that they are exactly equal, but in the future it can be smarter ( LUCENE-5499 ). Hopefully this makes the integration easier.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user cpoerschke opened a pull request:

          https://github.com/apache/lucene-solr/pull/176

          [incomplete] SOLR-5730: make SortingMergePolicy/EarlyTerminatingSortingCollector configurable

          incomplete. for https://issues.apache.org/jira/i#browse/SOLR-5730 ticket.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr trunk-sort-outline

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/176.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #176


          commit 5984692c660b32ddec0260e5a096b7b7eea5ea8d
          Author: Christine Poerschke <cpoerschke@bloomberg.net>
          Date: 2015-06-29T15:02:44Z

          LUCENE-????: support SortingMergePolicy-free use of EarlyTerminatingSortingCollector

          motivation:

          • SOLR-5730 to make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr.
          • outline of draft SOLR-5730 changes:
            + SolrIndexWriter constructor calls SolrIndexConfig.toIndexWriterConfig (passing the result to its lucene.IndexWriter super class)
            + SolrIndexConfig.toIndexWriterConfig(SolrCore core) calls SolrIndexConfig.buildMergePolicy
            + SolrIndexConfig.buildMergePolicy(IndexSchema schema) calls the SortingMergePolicy constructor (using the IndexSchema's mergeSortSpec)
            + SolrIndexSearcher.buildAndRunCollectorChain calls the EarlyTerminatingSortingCollector constructor (using the IndexSchema's mergeSortSpec)

          summary of changes:

          • added static isSorted variant to SortingMergePolicy
          • added SortingMergePolicy-free EarlyTerminatingSortingCollector constructor variant
          • added SortingMergePolicy-free EarlyTerminatingSortingCollector.canEarlyTerminate variant
          • corresponding changes to TestEarlyTerminatingSortingCollector (randomly choose between constructor and canEarlyTerminate variants)

          commit 78de6620a9f50ab632c1fc240a336137acf9199f
          Author: Christine Poerschke <cpoerschke@bloomberg.net>
          Date: 2015-06-29T14:22:29Z

          [incomplete] SOLR-5730: make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr

          summary of changes so far:

          • schema.xml: added optional mergeSortSpec field to [Managed]IndexSchema
          • solrconfig.xml: SolrIndexConfig.buildMergePolicy to constructs a SortingMergePolicy if the useSortingMergePolicy flag is set
          • optional segmentTerminateEarly parameter added to CommonParams and QueryComponent (defaulted to existing behaviour)
          • SolrIndexSearcher: constructs EarlyTerminatingSortingCollector if segmentTerminateEarly flag is set (and requested sort is compatible with the mergeSortSpec)

          still incomplete:

          • IndexSchema: convert/parse 'String mergeSortSpecString;' into 'SortSpec mergeSortSpec;'
          • queryResult caching to consider the segmentTerminateEarly flag (a segmentTerminateEarly=false search should not get cached results from a segmentTerminateEarly=true search)
          • documentation/example for the segmentTerminateEarly query parameter (and the new optional schema.xml/solrconfig.xml elements)
          • test cases

          Show
          githubbot ASF GitHub Bot added a comment - GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/176 [incomplete] SOLR-5730 : make SortingMergePolicy/EarlyTerminatingSortingCollector configurable incomplete. for https://issues.apache.org/jira/i#browse/SOLR-5730 ticket. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-sort-outline Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/176.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #176 commit 5984692c660b32ddec0260e5a096b7b7eea5ea8d Author: Christine Poerschke <cpoerschke@bloomberg.net> Date: 2015-06-29T15:02:44Z LUCENE-????: support SortingMergePolicy-free use of EarlyTerminatingSortingCollector motivation: SOLR-5730 to make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr. outline of draft SOLR-5730 changes: + SolrIndexWriter constructor calls SolrIndexConfig.toIndexWriterConfig (passing the result to its lucene.IndexWriter super class) + SolrIndexConfig.toIndexWriterConfig(SolrCore core) calls SolrIndexConfig.buildMergePolicy + SolrIndexConfig.buildMergePolicy(IndexSchema schema) calls the SortingMergePolicy constructor (using the IndexSchema's mergeSortSpec) + SolrIndexSearcher.buildAndRunCollectorChain calls the EarlyTerminatingSortingCollector constructor (using the IndexSchema's mergeSortSpec) summary of changes: added static isSorted variant to SortingMergePolicy added SortingMergePolicy-free EarlyTerminatingSortingCollector constructor variant added SortingMergePolicy-free EarlyTerminatingSortingCollector.canEarlyTerminate variant corresponding changes to TestEarlyTerminatingSortingCollector (randomly choose between constructor and canEarlyTerminate variants) commit 78de6620a9f50ab632c1fc240a336137acf9199f Author: Christine Poerschke <cpoerschke@bloomberg.net> Date: 2015-06-29T14:22:29Z [incomplete] SOLR-5730 : make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr summary of changes so far: schema.xml: added optional mergeSortSpec field to [Managed] IndexSchema solrconfig.xml: SolrIndexConfig.buildMergePolicy to constructs a SortingMergePolicy if the useSortingMergePolicy flag is set optional segmentTerminateEarly parameter added to CommonParams and QueryComponent (defaulted to existing behaviour) SolrIndexSearcher: constructs EarlyTerminatingSortingCollector if segmentTerminateEarly flag is set (and requested sort is compatible with the mergeSortSpec) still incomplete: IndexSchema: convert/parse 'String mergeSortSpecString;' into 'SortSpec mergeSortSpec;' queryResult caching to consider the segmentTerminateEarly flag (a segmentTerminateEarly=false search should not get cached results from a segmentTerminateEarly=true search) documentation/example for the segmentTerminateEarly query parameter (and the new optional schema.xml/solrconfig.xml elements) test cases
          Hide
          cpoerschke Christine Poerschke added a comment -

          Uploaded incomplete rebased-against-current-trunk changes and updated example configuration to match. The proposed solr changes require the changes proposed in LUCENE-6646 because at the point that solr needs to construct an EarlyTerminatingSortingCollector object it doesn't have access to the SortingMergePolicy in use, commit message about has further details.

          Show
          cpoerschke Christine Poerschke added a comment - Uploaded incomplete rebased-against-current-trunk changes and updated example configuration to match. The proposed solr changes require the changes proposed in LUCENE-6646 because at the point that solr needs to construct an EarlyTerminatingSortingCollector object it doesn't have access to the SortingMergePolicy in use, commit message about has further details.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Making the SortingMergePolicy configurable means that the SortSpec configured needs to be parsed. SOLR-8283 intends to factor out SortSpecParsing[Test] from QueryParsing[Test] to help with this.

          Show
          cpoerschke Christine Poerschke added a comment - Making the SortingMergePolicy configurable means that the SortSpec configured needs to be parsed. SOLR-8283 intends to factor out SortSpecParsing[Test] from QueryParsing[Test] to help with this.
          Hide
          cpoerschke Christine Poerschke added a comment -

          SOLR-5730: make Lucene's SortingMergePolicy configurable in Solr

          Example configuration:

          • solrconfig.xml <useSortingMergePolicy>true</useSortingMergePolicy>
          • schema.xml <mergeSortSpec>timestamp desc</mergeSortSpec>
          Show
          cpoerschke Christine Poerschke added a comment - SOLR-5730 : make Lucene's SortingMergePolicy configurable in Solr Example configuration: solrconfig.xml <useSortingMergePolicy>true</useSortingMergePolicy> schema.xml <mergeSortSpec>timestamp desc</mergeSortSpec>
          Hide
          cpoerschke Christine Poerschke added a comment -

          SOLR-5730: optional segmentTerminateEarly=(false|true) parameter (to allow use of Lucene's EarlyTerminatingSortingCollector)

          If segmentTerminateEarly=true parameter is passed then Lucene's EarlyTerminatingSortingCollector will be used for ungrouped queries whose sort parameter is compatible with the <mergeSortSpec> configured in the schema.xml.

          Show
          cpoerschke Christine Poerschke added a comment - SOLR-5730 : optional segmentTerminateEarly=(false|true) parameter (to allow use of Lucene's EarlyTerminatingSortingCollector) If segmentTerminateEarly=true parameter is passed then Lucene's EarlyTerminatingSortingCollector will be used for ungrouped queries whose sort parameter is compatible with the <mergeSortSpec> configured in the schema.xml.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Please find attached a two-part patch against latest trunk:

          • Part 1 is to make Lucene's SortingMergePolicy configurable in Solr.
          • Part 2 is to allow use of Lucene's EarlyTerminatingSortingCollector via an optional segmentTerminateEarly=(false|true) search request parameter.

          Related linked tickets already completed various preparation work and so the two patches now then are pretty pure and specific to only the change itself here.

          Reviews, comments, etc. welcome. Thank you.

          Show
          cpoerschke Christine Poerschke added a comment - Please find attached a two-part patch against latest trunk: Part 1 is to make Lucene's SortingMergePolicy configurable in Solr. Part 2 is to allow use of Lucene's EarlyTerminatingSortingCollector via an optional segmentTerminateEarly=(false|true) search request parameter. Related linked tickets already completed various preparation work and so the two patches now then are pretty pure and specific to only the change itself here. Reviews, comments, etc. welcome. Thank you.
          Hide
          hossman Hoss Man added a comment -

          Probably a naive question, but...

          I understand that the reason we need new config syntax is to be able to wrap whatever was specified by <mergePolicy/> – but i'm not really a fan of the 2 new disjoint top level config options. Why not keep those two concepts (should we sort the merges? how should we sort the merges?) in a single place? (my gut says solrconfig.xml since that's where other merge related settings live.)

          ie...

          <sortMerges enabled="true">popularity desc, timestamp desc</sortMerges>
          

          what's the motivation/conceptual reasoning behind part of that being configured in the schema and part of it in solrconfig?

          Show
          hossman Hoss Man added a comment - Probably a naive question, but... I understand that the reason we need new config syntax is to be able to wrap whatever was specified by <mergePolicy/> – but i'm not really a fan of the 2 new disjoint top level config options. Why not keep those two concepts (should we sort the merges? how should we sort the merges?) in a single place? (my gut says solrconfig.xml since that's where other merge related settings live.) ie... <sortMerges enabled= " true " >popularity desc, timestamp desc</sortMerges> what's the motivation/conceptual reasoning behind part of that being configured in the schema and part of it in solrconfig?
          Hide
          tomasflobbe Tomás Fernández Löbbe added a comment -

          This feature sounds really good. I have some questions/comments:

          I'm wondering if its better to have the segmentTerminateEarly parameter or if we should just use the EarlyTerminatingSortingCollector whenever the MP is SortingMergePolicy and the sort spec is compatible. Or maybe the default should be "use it if possible", and if the parameter is explicitly set to true, use it if possible, and error if not? If the parameter is set to "false", don't use the collector even if possible.

          Looking at this

          +      if (segmentTerminatedEarly != null) {
          +        final Object existingSegmentTerminatedEarly = rb.rsp.getResponseHeader().get(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY);
          +        if(existingSegmentTerminatedEarly == null) {
          +          rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly);
          +        } else if (!segmentTerminatedEarly.equals(existingSegmentTerminatedEarly)) {
          +          rb.rsp.getResponseHeader().remove(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY);
          +          rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly);
          +        }
          +      }
          

          If there is an existing value, the new one should be existing OR new, right? If we want the per shard information we could add that to the shards_info section.

          Show
          tomasflobbe Tomás Fernández Löbbe added a comment - This feature sounds really good. I have some questions/comments: I'm wondering if its better to have the segmentTerminateEarly parameter or if we should just use the EarlyTerminatingSortingCollector whenever the MP is SortingMergePolicy and the sort spec is compatible. Or maybe the default should be "use it if possible", and if the parameter is explicitly set to true, use it if possible, and error if not? If the parameter is set to "false", don't use the collector even if possible. Looking at this + if (segmentTerminatedEarly != null) { + final Object existingSegmentTerminatedEarly = rb.rsp.getResponseHeader().get(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY); + if(existingSegmentTerminatedEarly == null) { + rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly); + } else if (!segmentTerminatedEarly.equals(existingSegmentTerminatedEarly)) { + rb.rsp.getResponseHeader().remove(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY); + rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly); + } + } If there is an existing value, the new one should be existing OR new , right? If we want the per shard information we could add that to the shards_info section.
          Hide
          cpoerschke Christine Poerschke added a comment -

          ... i'm not really a fan of the 2 new disjoint top level config options ...

          I agree that having a <useSortingMergePolicy>true</useSortingMergePolicy> element in solrconfig.xml and a <mergeSortSpec>timestamp desc</mergeSortSpec> element in schema.xml is tricky but personally am not won over by either of the obvious two alternatives either:

          Alternative #1 solrconfig.xml element only:
          + advantage: the element is co-located with the existing <mergePolicy> element
          - disadvantages:

          • solrconfig.xml references schema.xml field names
          • solr.search.SolrIndexSearcher would need (more) access to the solr.update.SolrIndexConfig object [implementation detail: SolrIndexSearcher.buildAndRunCollectorChain needs to be able to pass the merge policy sort to EarlyTerminatingSortingCollector but only one of the two SolrIndexSearcher constructors currently takes an SolrIndexConfig argument.]

          Alternative #2 schema.xml element only:
          + advantage: the element uses solr field names defined in the same file
          - disadvantage: the presence or absence of an element in schema.xml has a side effect on how the solrconfig.xml <mergePolicy> element is instantiated

          Alternative #3 perhaps there is another way?

          Show
          cpoerschke Christine Poerschke added a comment - ... i'm not really a fan of the 2 new disjoint top level config options ... I agree that having a <useSortingMergePolicy>true</useSortingMergePolicy> element in solrconfig.xml and a <mergeSortSpec>timestamp desc</mergeSortSpec> element in schema.xml is tricky but personally am not won over by either of the obvious two alternatives either: Alternative #1 solrconfig.xml element only: + advantage: the element is co-located with the existing <mergePolicy> element - disadvantages: solrconfig.xml references schema.xml field names solr.search.SolrIndexSearcher would need (more) access to the solr.update.SolrIndexConfig object [implementation detail: SolrIndexSearcher.buildAndRunCollectorChain needs to be able to pass the merge policy sort to EarlyTerminatingSortingCollector but only one of the two SolrIndexSearcher constructors currently takes an SolrIndexConfig argument.] Alternative #2 schema.xml element only: + advantage: the element uses solr field names defined in the same file - disadvantage: the presence or absence of an element in schema.xml has a side effect on how the solrconfig.xml <mergePolicy> element is instantiated Alternative #3 perhaps there is another way?
          Hide
          cpoerschke Christine Poerschke added a comment -

          If there is an existing value, the new one should be existing OR new, right?

          Good catch, will fix.

          If we want the per shard information we could add that to the shards_info section.

          Good idea, will make a change to add that (with corresponding test).

          Show
          cpoerschke Christine Poerschke added a comment - If there is an existing value, the new one should be existing OR new , right? Good catch, will fix. If we want the per shard information we could add that to the shards_info section. Good idea, will make a change to add that (with corresponding test).
          Hide
          cpoerschke Christine Poerschke added a comment -

          I'm wondering if its better to have the segmentTerminateEarly parameter or if we should just use the EarlyTerminatingSortingCollector whenever the MP is SortingMergePolicy and the sort spec is compatible. ...

          Good question. segmentTerminateEarly will influence i.e. potentially reduce the numFound value in the response, so because of that I'd say that segment-terminating-early should be an opt-in rather than an opt-out parameter.

          Show
          cpoerschke Christine Poerschke added a comment - I'm wondering if its better to have the segmentTerminateEarly parameter or if we should just use the EarlyTerminatingSortingCollector whenever the MP is SortingMergePolicy and the sort spec is compatible. ... Good question. segmentTerminateEarly will influence i.e. potentially reduce the numFound value in the response, so because of that I'd say that segment-terminating-early should be an opt-in rather than an opt-out parameter.
          Hide
          tomasflobbe Tomás Fernández Löbbe added a comment -

          Missed that.
          The parameter makes more sense then.

          Show
          tomasflobbe Tomás Fernández Löbbe added a comment - Missed that. The parameter makes more sense then.
          Hide
          hossman Hoss Man added a comment -

          Alternative #1 solrconfig.xml element only:

          I think this is the best approach conceptually from the standpoint of a user's mental model.

          solrconfig.xml references schema.xml field names

          This has always been the case – no new ground here, and not soemthing we have any reason to start avoiding. sane solrconfig.xml settings have always depended on an understanding of the schema.xml, it's going the other way (schema.xml settings requiring some knowledge of the solrconfig.xml) that has always been avoided, because it breaks the conceptual model of the distinction between "what, abstractly, is my data? (schema)" and "how, practically, do i want to use my data? (solrconfig).

          segment sorting feels very much like a "practically speaking, how am i using this data" type question, and not a "conceptually speaking, what do these documents represent" type question.

          SolrIndexSearcher.buildAndRunCollectorChain needs to be able to pass the merge policy sort to EarlyTerminatingSortingCollector but only one of the two SolrIndexSearcher constructors currently takes an SolrIndexConfig

          We should put more work into making the user experience better at the expense of implementation details – in this case though i'm not sure i really see the problem? SolrIndexSearcher always knows it's SolrCore which has accessors for the SolrConfig, which exposes the SlrIndexConfig ... correct?

          Show
          hossman Hoss Man added a comment - Alternative #1 solrconfig.xml element only: I think this is the best approach conceptually from the standpoint of a user's mental model. solrconfig.xml references schema.xml field names This has always been the case – no new ground here, and not soemthing we have any reason to start avoiding. sane solrconfig.xml settings have always depended on an understanding of the schema.xml, it's going the other way (schema.xml settings requiring some knowledge of the solrconfig.xml) that has always been avoided, because it breaks the conceptual model of the distinction between "what, abstractly, is my data? (schema)" and "how, practically, do i want to use my data? (solrconfig). segment sorting feels very much like a "practically speaking, how am i using this data" type question, and not a "conceptually speaking, what do these documents represent" type question. SolrIndexSearcher.buildAndRunCollectorChain needs to be able to pass the merge policy sort to EarlyTerminatingSortingCollector but only one of the two SolrIndexSearcher constructors currently takes an SolrIndexConfig We should put more work into making the user experience better at the expense of implementation details – in this case though i'm not sure i really see the problem? SolrIndexSearcher always knows it's SolrCore which has accessors for the SolrConfig, which exposes the SlrIndexConfig ... correct?
          Hide
          cpoerschke Christine Poerschke added a comment -

          Okay, if the "solrconfig.xml references schema.xml field names" thing is not a show-stopper then I'm happy to run with the "Alternative #1 solrconfig.xml element only" approach.

          Show
          cpoerschke Christine Poerschke added a comment - Okay, if the "solrconfig.xml references schema.xml field names" thing is not a show-stopper then I'm happy to run with the "Alternative #1 solrconfig.xml element only" approach.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Any thoughts on how crucial support for the enable attribute is in the solrconfig.xml element?

          <sortMerges enable="true">popularity desc, timestamp desc</sortMerges>
          

          (In my opinion it is crucial for anyone wishing to use something like enable="${solr.sortMerges.enable:false}" to easily transition to use a SortingMergePolicy.)

          Any views on this alternative?

          <sortMerges enable="true">
            <str name="sort">popularity desc, timestamp desc</str>
          </sortMerges>
          

          (In my opinion it's almost as user-friendly and if Lucene's SortingMergePolicy's constructor in future ever were to require an extra argument this could be accommodated in a straightforward way with an extra <int name="answer">42</int> element.)

          Why the above questions?

          Implementation-wise it appears that primitive <some>string</some> elements have no support for attributes. At first glance it appears that we could in SolrIndexConfig.java creatively solve this with something like this:

            lockType=solrConfig.get(prefix+"/lockType", def.lockType);
            ...
            mergePolicyInfo = getPluginInfo(prefix + "/mergePolicy", solrConfig, def.mergePolicyInfo);
          + if (!solrConfig.readPluginInfos(prefix + "/sortMerges", false /*requireName*/, false /*requireClass*/).isEmpty()) {
          +   sortMerges = solrConfig.get(prefix + "/sortMerges", def.sortMerges);
          + } else {
          +   sortMerges = def.sortMerges;
          + }
          

          However, on further consideration a map returned by SolrIndexConfig.toMap() would then not represent the complete solrconfig.xml element (any enable attribute in the primitive sortMerges element would be missing) i.e. <sortMerges enable="true">popularity desc, timestamp desc</sortMerges> would reduce down to "sortMerges" : "popularity desc, timestamp desc" in the map and <sortMerges enable="false">popularity desc, timestamp desc</sortMerges> would be absent from the map.

          Show
          cpoerschke Christine Poerschke added a comment - Any thoughts on how crucial support for the enable attribute is in the solrconfig.xml element? <sortMerges enable="true">popularity desc, timestamp desc</sortMerges> (In my opinion it is crucial for anyone wishing to use something like enable="${solr.sortMerges.enable:false}" to easily transition to use a SortingMergePolicy.) Any views on this alternative? <sortMerges enable="true"> <str name="sort">popularity desc, timestamp desc</str> </sortMerges> (In my opinion it's almost as user-friendly and if Lucene's SortingMergePolicy's constructor in future ever were to require an extra argument this could be accommodated in a straightforward way with an extra <int name="answer">42</int> element.) Why the above questions? Implementation-wise it appears that primitive <some>string</some> elements have no support for attributes. At first glance it appears that we could in SolrIndexConfig.java creatively solve this with something like this: lockType=solrConfig.get(prefix+ "/lockType" , def.lockType); ... mergePolicyInfo = getPluginInfo(prefix + "/mergePolicy" , solrConfig, def.mergePolicyInfo); + if (!solrConfig.readPluginInfos(prefix + "/sortMerges" , false /*requireName*/, false /*requireClass*/).isEmpty()) { + sortMerges = solrConfig.get(prefix + "/sortMerges" , def.sortMerges); + } else { + sortMerges = def.sortMerges; + } However, on further consideration a map returned by SolrIndexConfig.toMap() would then not represent the complete solrconfig.xml element (any enable attribute in the primitive sortMerges element would be missing) i.e. <sortMerges enable="true">popularity desc, timestamp desc</sortMerges> would reduce down to "sortMerges" : "popularity desc, timestamp desc" in the map and <sortMerges enable="false">popularity desc, timestamp desc</sortMerges> would be absent from the map.
          Hide
          hossman Hoss Man added a comment -

          Any views on this alternative?

          +1

          Show
          hossman Hoss Man added a comment - Any views on this alternative? +1
          Hide
          cpoerschke Christine Poerschke added a comment -

          Have chosen

          +      if (segmentTerminatedEarly != null) {
          +        final Object existingSegmentTerminatedEarly = rb.rsp.getResponseHeader().get(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY);
          +        if(existingSegmentTerminatedEarly == null) {
          +          rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly);
          +        } else if (!Boolean.TRUE.equals(existingSegmentTerminatedEarly) && Boolean.TRUE.equals(segmentTerminatedEarly)) {
          +          rb.rsp.getResponseHeader().remove(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY);
          +          rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly);
          +        }
          +      }
          

          as implementation for the existing OR new to avoid unnecessary remove-and-then-add-back if existing is already TRUE.

          Show
          cpoerschke Christine Poerschke added a comment - Have chosen + if (segmentTerminatedEarly != null ) { + final Object existingSegmentTerminatedEarly = rb.rsp.getResponseHeader().get(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY); + if (existingSegmentTerminatedEarly == null ) { + rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly); + } else if (! Boolean .TRUE.equals(existingSegmentTerminatedEarly) && Boolean .TRUE.equals(segmentTerminatedEarly)) { + rb.rsp.getResponseHeader().remove(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY); + rb.rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY, segmentTerminatedEarly); + } + } as implementation for the existing OR new to avoid unnecessary remove-and-then-add-back if existing is already TRUE.
          Hide
          cpoerschke Christine Poerschke added a comment -

          The revised attached patch uses this alternative.

          A side-effect of the alternative, beneficial in my opinion, is that the simple default use is

          <sortMerges enable="true">
            <str name="sort">popularity desc, timestamp desc</str>
          </sortMerges>
          

          but something customised such as

          <sortMerges enable="true" class="MyInstrumentedSortingMergePolicy">
            <str name="sort">popularity desc, timestamp desc</str>
          </sortMerges>
          

          is now then also supported. Or in implementation-speak: the class attribute of the sortMerges element is optional (defaulted to org.apache.lucene.index.SortingMergePolicy).

          Show
          cpoerschke Christine Poerschke added a comment - The revised attached patch uses this alternative. A side-effect of the alternative, beneficial in my opinion, is that the simple default use is <sortMerges enable="true"> <str name="sort">popularity desc, timestamp desc</str> </sortMerges> but something customised such as <sortMerges enable="true" class="MyInstrumentedSortingMergePolicy"> <str name="sort">popularity desc, timestamp desc</str> </sortMerges> is now then also supported. Or in implementation-speak: the class attribute of the sortMerges element is optional (defaulted to org.apache.lucene.index.SortingMergePolicy ).
          Hide
          shaie Shai Erera added a comment - - edited

          I was hoping that we'd change solrconfig.xml to support something like a <mergepolicy> element which will allow you to define a nested MP. That way one can define SortingMergePolicy by an element like that:

          <mergepolicy class="solr.SortingMergePolicy">
            <params>
              <sort>f1 desc, f2 asc ... </sort>
            </param>
            <mergepolicy class="solr.TieredMergePolicy">
            </mergepolicy>
          </mergepolicy>
          

          This would allow us to define any "wrapper" MPs in a more general way. What do you think?

          Show
          shaie Shai Erera added a comment - - edited I was hoping that we'd change solrconfig.xml to support something like a <mergepolicy> element which will allow you to define a nested MP. That way one can define SortingMergePolicy by an element like that: <mergepolicy class="solr.SortingMergePolicy"> <params> <sort>f1 desc, f2 asc ... </sort> </param> <mergepolicy class="solr.TieredMergePolicy"> </mergepolicy> </mergepolicy> This would allow us to define any "wrapper" MPs in a more general way. What do you think?
          Hide
          cpoerschke Christine Poerschke added a comment - - edited

          solrconfig.xml to support something like a <mergepolicy> element which will allow you to define a nested MP.

          I also had noticed the similarity between the with-class-attribute sortMerges element and the existing mergePolicy element. The elements within a mergePolicy currently can be only primitive elements and mergePolicy-within-mergePolicy elements might be confusing from a user's perspective?



          If we want wrapper merge policies support then I would instead favour 'chaining' instead of 'nesting' of the solrconfig.xml elements, for example:

          <indexConfig>
            <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
              <int name="segmentsPerTier">42</int>
            </mergePolicy>
            <mergePolicy class="org.apache.lucene.index.SortingMergePolicy">
              <str name="sort">timestamp desc</str>
            </mergePolicy>
          </indexConfig>
          

          SolrIndexConfig.java code outline:

          private MergePolicy buildMergePolicyChain(IndexSchema schema) {
            MergePolicy policy = null;
            for (PluginInfo mergePolicyInfo : mergePolicyInfos) {
              String mpClassName = mergePolicyInfo.className;
              Class<?> mpClass = schema.getResourceLoader().findClass(mpClassName, MergePolicy.class);
          
              NamedList initArgs = mergePolicyInfo.initArgs;
          
              if (LogMergePolicy.class.isAssignableFrom(mpClass)) {
                LogMergePolicy logMergePolicy = schema.getResourceLoader().newInstance(mpClassName, LogMergePolicy.class);
                // ...
                policy = logMergePolicy;
              } else if (TieredMergePolicy.class.isAssignableFrom(mpClass)) {
                TieredMergePolicy tieredMergePolicy = schema.getResourceLoader().newInstance(mpClassName, TieredMergePolicy.class);
                // ...
                policy = tieredMergePolicy;
              } else if (SortingMergePolicy.class.isAssignableFrom(mpClass)) {
                // clone initArgs so that we can remove the 'sort' element
                initArgs = initArgs.clone();
                final Object sortArg = initArgs.remove("sort");
                // now parse the 'sort' element
                final Sort mergeSort = SortSpecParsing.parseSortSpec((String)sortArg, schema).getSort();
                // then construct the sorting merge policy
                log.info("Using {}(policy={},mergeSort={})", mpClassName, policy, mergeSort);
                SortingMergePolicy sortingMergePolicy = schema.getResourceLoader().newInstance(mpClassName, SortingMergePolicy.class,
                    null,
                    new Class[] { MergePolicy.class, Sort.class },
                    new Object[] { policy, mergeSort });
                policy = sortingMergePolicy;
              } else {
                policy = schema.getResourceLoader().newInstance(mpClassName, MergePolicy.class);
              }
          
              SolrPluginUtils.invokeSetters(policy, initArgs);
            }
            return policy;
          }
          
          Show
          cpoerschke Christine Poerschke added a comment - - edited solrconfig.xml to support something like a <mergepolicy> element which will allow you to define a nested MP. I also had noticed the similarity between the with-class-attribute sortMerges element and the existing mergePolicy element. The elements within a mergePolicy currently can be only primitive elements and mergePolicy-within-mergePolicy elements might be confusing from a user's perspective? If we want wrapper merge policies support then I would instead favour 'chaining' instead of 'nesting' of the solrconfig.xml elements, for example: <indexConfig> <mergePolicy class= "org.apache.lucene.index.TieredMergePolicy" > < int name= "segmentsPerTier" >42</ int > </mergePolicy> <mergePolicy class= "org.apache.lucene.index.SortingMergePolicy" > <str name= "sort" >timestamp desc</str> </mergePolicy> </indexConfig> SolrIndexConfig.java code outline: private MergePolicy buildMergePolicyChain(IndexSchema schema) { MergePolicy policy = null ; for (PluginInfo mergePolicyInfo : mergePolicyInfos) { String mpClassName = mergePolicyInfo.className; Class <?> mpClass = schema.getResourceLoader().findClass(mpClassName, MergePolicy.class); NamedList initArgs = mergePolicyInfo.initArgs; if (LogMergePolicy.class.isAssignableFrom(mpClass)) { LogMergePolicy logMergePolicy = schema.getResourceLoader().newInstance(mpClassName, LogMergePolicy.class); // ... policy = logMergePolicy; } else if (TieredMergePolicy.class.isAssignableFrom(mpClass)) { TieredMergePolicy tieredMergePolicy = schema.getResourceLoader().newInstance(mpClassName, TieredMergePolicy.class); // ... policy = tieredMergePolicy; } else if (SortingMergePolicy.class.isAssignableFrom(mpClass)) { // clone initArgs so that we can remove the 'sort' element initArgs = initArgs.clone(); final Object sortArg = initArgs.remove( "sort" ); // now parse the 'sort' element final Sort mergeSort = SortSpecParsing.parseSortSpec(( String )sortArg, schema).getSort(); // then construct the sorting merge policy log.info( "Using {}(policy={},mergeSort={})" , mpClassName, policy, mergeSort); SortingMergePolicy sortingMergePolicy = schema.getResourceLoader().newInstance(mpClassName, SortingMergePolicy.class, null , new Class [] { MergePolicy.class, Sort.class }, new Object [] { policy, mergeSort }); policy = sortingMergePolicy; } else { policy = schema.getResourceLoader().newInstance(mpClassName, MergePolicy.class); } SolrPluginUtils.invokeSetters(policy, initArgs); } return policy; }
          Hide
          dsmiley David Smiley added a comment -

          +1 to chaining.

          As an aside, It's nice to see the collective input of all of us help make this feature even better than originally proposed. Thanks for putting effort into this Christine.

          Show
          dsmiley David Smiley added a comment - +1 to chaining. As an aside, It's nice to see the collective input of all of us help make this feature even better than originally proposed. Thanks for putting effort into this Christine.
          Hide
          shaie Shai Erera added a comment -

          I'm OK with chaining, but then can we at least wrap all MPs in a logical element?

          <indexConfig>
            <mergePolicies>
              ...
            </mergePolicies>
          </indexConfig>
          

          That way, if we see only <mergePolicy>, it's a single one, otherwise (<mergePolicies>) we activate chaining. We're both backward-compatible and also don't risk someone stuffing in between elements that aren't merge policies. Also, I feel that if I'd use it, it would make sense to me that all MPs are actually one thing, and not multiple instances...

          Show
          shaie Shai Erera added a comment - I'm OK with chaining, but then can we at least wrap all MPs in a logical element? <indexConfig> <mergePolicies> ... </mergePolicies> </indexConfig> That way, if we see only <mergePolicy> , it's a single one, otherwise ( <mergePolicies> ) we activate chaining. We're both backward-compatible and also don't risk someone stuffing in between elements that aren't merge policies. Also, I feel that if I'd use it, it would make sense to me that all MPs are actually one thing, and not multiple instances...
          Hide
          shaie Shai Erera added a comment -

          Rethinking my proposal, I'm not sure chaining merge policies is the right approach. What does it mean if I chain LogMergePolicy and TieredMergePolicy? How about if we introduce a MergePolicyPlugin/Factory, which would allow one to implement whatever MergePolicy he wants (including wrapping others). And we introduce out-of-the-box a SortingMergePolicyPlugin/Factory which takes flat attributes to initialize itself, e.g.:

            <mergePolicyPlugin class="solr.SortingMergePolicyPlugin">
              <sort>f1 desc, f2 asc, ... </sort>
              <delegate.class>solr.TieredMergePolicy"</delegate.class>
              <delegate.mergeFactor>10</delegate.mergeFactor>
            </mergePolicyPlugin>
          

          Alternatively, if this works too, it's better IMO:

            <mergePolicyPlugin class="solr.SortingMergePolicyPlugin">
              <sort>f1 desc, f2 asc, ... </sort>
              <delegate class="solr.TieredMergePolicy"
                        mergeFactor="10"
                        ...
              </delegate>
            </mergePolicyPlugin>
          
          Show
          shaie Shai Erera added a comment - Rethinking my proposal, I'm not sure chaining merge policies is the right approach. What does it mean if I chain LogMergePolicy and TieredMergePolicy? How about if we introduce a MergePolicyPlugin/Factory, which would allow one to implement whatever MergePolicy he wants (including wrapping others). And we introduce out-of-the-box a SortingMergePolicyPlugin/Factory which takes flat attributes to initialize itself, e.g.: <mergePolicyPlugin class="solr.SortingMergePolicyPlugin"> <sort>f1 desc, f2 asc, ... </sort> <delegate.class>solr.TieredMergePolicy"</delegate.class> <delegate.mergeFactor>10</delegate.mergeFactor> </mergePolicyPlugin> Alternatively, if this works too, it's better IMO: <mergePolicyPlugin class="solr.SortingMergePolicyPlugin"> <sort>f1 desc, f2 asc, ... </sort> <delegate class="solr.TieredMergePolicy" mergeFactor="10" ... </delegate> </mergePolicyPlugin>
          Hide
          cpoerschke Christine Poerschke added a comment -

          I like the idea of a MergePolicyFactory and the idea of delegate.something notation is also interesting.

          There would have to be out-of-the-box factories for all existing non-wrapping merge policies e.g.

          <!-- public TieredMergePolicy() { -->
          <mergePolicyFactory class="TieredMergePolicyFactory">
            <int name="segmentsPerTier">42</int>
          </mergePolicyFactory>
          

          An out-of-the-box factory would also be provided for existing plain wrapping merge policies e.g.

          <!-- public UpgradeIndexMergePolicy(MergePolicy base) { -->
          <mergePolicyFactory class="UpgradeIndexMergePolicyFactory">
            <str name="base">TieredMergePolicyFactory</str>
            <int name="base.segmentsPerTier">42</int>
          </mergePolicyFactory>
          

          The factory for building sorting merge policies could look something like this:

          <!-- public SortingMergePolicy(MergePolicy in, Sort sort) { -->
          <mergePolicyFactory class="SortingMergePolicyFactory">
            <str name="in">TieredMergePolicyFactory</str>
            <int name="in.segmentsPerTier">42</int>
            <str name="sort">timestamp desc</str>
          </mergePolicyFactory>
          

          For consistency we should (in my opinion) conceptually support wrapping of wrapping merge policies e.g. someone could go and create their own MyInstrumentedMergePolicyFactory factory and use it like this:

          <!-- public MyInstrumentedMergePolicy(MergePolicy mergePolicy) { -->
          <mergePolicyFactory class="MyInstrumentedMergePolicyFactory">
            <str name="mergePolicy">SortingMergePolicyFactory</str>
            <str name="mergePolicy.in">TieredMergePolicyFactory</str>
            <int name="mergePolicy.in.segmentsPerTier">42</int>
            <str name="mergePolicy.sort">timestamp desc</str>
            <int name="instrumentationLevel">42</int>
          </mergePolicyFactory>
          

          ticketing details: if we choose to go down the MergePolicyFactory route then i would suggest this would be best done via a separate JIRA ticket and when that separate JIRA ticket completes the SOLR-5730 efforts here would resume/continue.

          Show
          cpoerschke Christine Poerschke added a comment - I like the idea of a MergePolicyFactory and the idea of delegate.something notation is also interesting. There would have to be out-of-the-box factories for all existing non-wrapping merge policies e.g. <!-- public TieredMergePolicy() { --> <mergePolicyFactory class= "TieredMergePolicyFactory" > < int name= "segmentsPerTier" >42</ int > </mergePolicyFactory> An out-of-the-box factory would also be provided for existing plain wrapping merge policies e.g. <!-- public UpgradeIndexMergePolicy(MergePolicy base) { --> <mergePolicyFactory class= "UpgradeIndexMergePolicyFactory" > <str name= "base" >TieredMergePolicyFactory</str> < int name= "base.segmentsPerTier" >42</ int > </mergePolicyFactory> The factory for building sorting merge policies could look something like this: <!-- public SortingMergePolicy(MergePolicy in, Sort sort) { --> <mergePolicyFactory class= "SortingMergePolicyFactory" > <str name= "in" >TieredMergePolicyFactory</str> < int name= "in.segmentsPerTier" >42</ int > <str name= "sort" >timestamp desc</str> </mergePolicyFactory> For consistency we should (in my opinion) conceptually support wrapping of wrapping merge policies e.g. someone could go and create their own MyInstrumentedMergePolicyFactory factory and use it like this: <!-- public MyInstrumentedMergePolicy(MergePolicy mergePolicy) { --> <mergePolicyFactory class= "MyInstrumentedMergePolicyFactory" > <str name= "mergePolicy" >SortingMergePolicyFactory</str> <str name= "mergePolicy.in" >TieredMergePolicyFactory</str> < int name= "mergePolicy.in.segmentsPerTier" >42</ int > <str name= "mergePolicy.sort" >timestamp desc</str> < int name= "instrumentationLevel" >42</ int > </mergePolicyFactory> ticketing details: if we choose to go down the MergePolicyFactory route then i would suggest this would be best done via a separate JIRA ticket and when that separate JIRA ticket completes the SOLR-5730 efforts here would resume/continue.
          Hide
          shaie Shai Erera added a comment -

          I don't think that we need to change all MPs to MPFs. If you encounter an MP, you parse it as usual (this is also needed for back-compat). If someone defines an explicit MPF, we use it. For now (this issue that is), we just implement a SortingMergePolicyFactory. For the "flat" MP, we can create a simple FlatMPF or ImplicitMPF (or whatever) that will parse the MP out of solrconfig.xml.

          I think that if we proceed with this issue before we have MPF, then we'll need to support both single MP, chained MP and MPF, which I don't think we want?

          Show
          shaie Shai Erera added a comment - I don't think that we need to change all MPs to MPFs. If you encounter an MP, you parse it as usual (this is also needed for back-compat). If someone defines an explicit MPF, we use it. For now (this issue that is), we just implement a SortingMergePolicyFactory. For the "flat" MP, we can create a simple FlatMPF or ImplicitMPF (or whatever) that will parse the MP out of solrconfig.xml. I think that if we proceed with this issue before we have MPF, then we'll need to support both single MP, chained MP and MPF, which I don't think we want?
          Hide
          cpoerschke Christine Poerschke added a comment -

          I don't think that we need to change all MPs to MPFs.

          We currently have mergeFactor and maxMergeDocs elements alongside the mergePolicy element. I would see the MergePolicyFactory route as an opportunity to deprecate the mergeFactor/maxMergeDocs/mergePolicy elements i.e. for a transition period users can use the existing/deprecated elements or the new mergePolicyFactory element but eventually we will remove the deprecated elements.

          existing/deprecated:

          <mergeFactor>12</mergeFactor>
          <maxMergeDocs>345</maxMergeDocs>
          <mergePolicy class="..." />
          

          replacement:

          <mergePolicyFactory class="...Factory">
            <int name="mergeFactor">12</int>
            <int name="maxMergeDocs">345</int>
          </mergePolicyFactory>
          

          existing code snippet:
          Note that maxMergeDocs only applies to LogMergePolicy policies.
          Note that mergeFactor only applies to LogMergePolicy and TieredMergePolicy policies.

          maxMergeDocs=solrConfig.getInt(prefix+"/maxMergeDocs",def.maxMergeDocs);
          mergeFactor=solrConfig.getInt(prefix+"/mergeFactor",def.mergeFactor);
          mergePolicyInfo = getPluginInfo(prefix + "/mergePolicy", solrConfig, def.mergePolicyInfo);
          ...
          if (policy instanceof LogMergePolicy) {
            LogMergePolicy logMergePolicy = (LogMergePolicy) policy;
            fixUseCFMergePolicyInitArg(LogMergePolicy.class);
            if (maxMergeDocs != -1)
              logMergePolicy.setMaxMergeDocs(maxMergeDocs);
            if (mergeFactor != -1)
              logMergePolicy.setMergeFactor(mergeFactor);
          } else if (policy instanceof TieredMergePolicy) {
            TieredMergePolicy tieredMergePolicy = (TieredMergePolicy) policy;
            fixUseCFMergePolicyInitArg(TieredMergePolicy.class);
            if (mergeFactor != -1) {
              tieredMergePolicy.setMaxMergeAtOnce(mergeFactor);
              tieredMergePolicy.setSegmentsPerTier(mergeFactor);
            }
          } else if (mergeFactor != -1) {
            log.warn("Use of <mergeFactor> cannot be configured if merge policy is not an instance of LogMergePolicy or TieredMergePolicy. The configured policy's defaults will be used.");
          }
          
          Show
          cpoerschke Christine Poerschke added a comment - I don't think that we need to change all MPs to MPFs. We currently have mergeFactor and maxMergeDocs elements alongside the mergePolicy element. I would see the MergePolicyFactory route as an opportunity to deprecate the mergeFactor/maxMergeDocs/mergePolicy elements i.e. for a transition period users can use the existing/deprecated elements or the new mergePolicyFactory element but eventually we will remove the deprecated elements. existing/deprecated: <mergeFactor>12</mergeFactor> <maxMergeDocs>345</maxMergeDocs> <mergePolicy class= "..." /> replacement: <mergePolicyFactory class= "...Factory" > < int name= "mergeFactor" >12</ int > < int name= "maxMergeDocs" >345</ int > </mergePolicyFactory> existing code snippet: Note that maxMergeDocs only applies to LogMergePolicy policies. Note that mergeFactor only applies to LogMergePolicy and TieredMergePolicy policies. maxMergeDocs=solrConfig.getInt(prefix+ "/maxMergeDocs" ,def.maxMergeDocs); mergeFactor=solrConfig.getInt(prefix+ "/mergeFactor" ,def.mergeFactor); mergePolicyInfo = getPluginInfo(prefix + "/mergePolicy" , solrConfig, def.mergePolicyInfo); ... if (policy instanceof LogMergePolicy) { LogMergePolicy logMergePolicy = (LogMergePolicy) policy; fixUseCFMergePolicyInitArg(LogMergePolicy.class); if (maxMergeDocs != -1) logMergePolicy.setMaxMergeDocs(maxMergeDocs); if (mergeFactor != -1) logMergePolicy.setMergeFactor(mergeFactor); } else if (policy instanceof TieredMergePolicy) { TieredMergePolicy tieredMergePolicy = (TieredMergePolicy) policy; fixUseCFMergePolicyInitArg(TieredMergePolicy.class); if (mergeFactor != -1) { tieredMergePolicy.setMaxMergeAtOnce(mergeFactor); tieredMergePolicy.setSegmentsPerTier(mergeFactor); } } else if (mergeFactor != -1) { log.warn( "Use of <mergeFactor> cannot be configured if merge policy is not an instance of LogMergePolicy or TieredMergePolicy. The configured policy's defaults will be used." ); }
          Hide
          shaie Shai Erera added a comment -

          OK I see, then in this issue you want to add another "alongside" attribute for SortingMP (<sortMerges>) and then deprecate the whole stuff and move to MergePolicyFactory? Or did you first intend to introduce MPF, and only then address SortingMP? I guess that adding <sortMerges> just to immediately deprecate it seems odd, especially if both will be released in 5.5...

          While we're at it, useCompoundFiles in Lucene has two meanings too: for IndexWriter it determines if flushed segments are flushed as .cfs and in MergePolicy it determines about merged segments. The two are not equivalent, so I suggest we note that in the MPF changes too.

          Show
          shaie Shai Erera added a comment - OK I see, then in this issue you want to add another "alongside" attribute for SortingMP ( <sortMerges> ) and then deprecate the whole stuff and move to MergePolicyFactory? Or did you first intend to introduce MPF, and only then address SortingMP? I guess that adding <sortMerges> just to immediately deprecate it seems odd, especially if both will be released in 5.5... While we're at it, useCompoundFiles in Lucene has two meanings too: for IndexWriter it determines if flushed segments are flushed as .cfs and in MergePolicy it determines about merged segments. The two are not equivalent, so I suggest we note that in the MPF changes too.
          Hide
          cpoerschke Christine Poerschke added a comment -

          ... did you first intend to introduce MPF ... adding <sortMerges> just to immediately deprecate it seems odd ...

          Yes, my preference would be to first introduce MPF with (say) UpgradeIndexMergePolicyFactory as example and for there to never be that "alongside" <sortMerges> attribute.

          Thanks for mentioning about useCompoundFiles meanings. It would be good i think to not carry over the fixUseCFMergePolicyInitArg logic into the MPF code.

          Show
          cpoerschke Christine Poerschke added a comment - ... did you first intend to introduce MPF ... adding <sortMerges> just to immediately deprecate it seems odd ... Yes, my preference would be to first introduce MPF with (say) UpgradeIndexMergePolicyFactory as example and for there to never be that "alongside" <sortMerges> attribute. Thanks for mentioning about useCompoundFiles meanings. It would be good i think to not carry over the fixUseCFMergePolicyInitArg logic into the MPF code.
          Hide
          shaie Shai Erera added a comment -

          OK so in that case it's better. We'll put this issue on-hold for MPF, then resume with it? Is there something I can help with?

          Show
          shaie Shai Erera added a comment - OK so in that case it's better. We'll put this issue on-hold for MPF, then resume with it? Is there something I can help with?
          Hide
          cpoerschke Christine Poerschke added a comment -

          SOLR-5730 will be suspended for now, to resume once SOLR-8621 is done.

          Show
          cpoerschke Christine Poerschke added a comment - SOLR-5730 will be suspended for now, to resume once SOLR-8621 is done.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Help and collaboration always welcome Have created SOLR-8621's master-solr-8621 work-in-progress branch with that in mind. Thanks!

          Show
          cpoerschke Christine Poerschke added a comment - Help and collaboration always welcome Have created SOLR-8621 's master-solr-8621 work-in-progress branch with that in mind. Thanks!
          Hide
          jkrupan Jack Krupansky added a comment -

          Sorry for arriving so late to the party here, but I've gotten lost in all the back and forth... is there going to be a simple and easy to use XML element to let the user simply enable sort merge and specify a field list, as opposed to having to manually construct an elaborate Lucene-level set of wrapped merge policies? I mean, sure, some experts will indeed wish to fully configure every detail of a Lucene merge policy, but for non-expert users who just want to assure that their index is pre-sorted to align with a query sorting, the syntax should be... simple. If the user does construct some elaborate wrapped MP, then some sort of parameter substitution would be needed, but if the user uses the default solrconfig which has no explicit MP, Solr should build that full, wrapped MP with just the sort field names substituted.

          In short, I just wanted to know whether this was intended to be a very easy to use feature (supposed to be the trademark of Solr) or some super-elaborate expert-only feature that we would be forced to recommend that average users stay away from.

          Personally, my preference would be to focus on introducing a first-class Solr feature of a "preferred document order", which is effectively a composite primary key in database nomenclature.

          So, let's not forget that this is Solr we are talking about, not raw Lucene.

          I'd like to know that Yonik Seeley and Hoss Man are explicitly on board with what is bring proposed.

          Show
          jkrupan Jack Krupansky added a comment - Sorry for arriving so late to the party here, but I've gotten lost in all the back and forth... is there going to be a simple and easy to use XML element to let the user simply enable sort merge and specify a field list, as opposed to having to manually construct an elaborate Lucene-level set of wrapped merge policies? I mean, sure, some experts will indeed wish to fully configure every detail of a Lucene merge policy, but for non-expert users who just want to assure that their index is pre-sorted to align with a query sorting, the syntax should be... simple. If the user does construct some elaborate wrapped MP, then some sort of parameter substitution would be needed, but if the user uses the default solrconfig which has no explicit MP, Solr should build that full, wrapped MP with just the sort field names substituted. In short, I just wanted to know whether this was intended to be a very easy to use feature (supposed to be the trademark of Solr) or some super-elaborate expert-only feature that we would be forced to recommend that average users stay away from. Personally, my preference would be to focus on introducing a first-class Solr feature of a "preferred document order", which is effectively a composite primary key in database nomenclature. So, let's not forget that this is Solr we are talking about, not raw Lucene. I'd like to know that Yonik Seeley and Hoss Man are explicitly on board with what is bring proposed.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Jack Krupansky - it's never too late to join the party here I've just updated the summary to reflect the latest patch state (though exact SOLR-8621 details are still a work-in-progress). Hope that clarifies things. Thanks for your input.

          Show
          cpoerschke Christine Poerschke added a comment - Jack Krupansky - it's never too late to join the party here I've just updated the summary to reflect the latest patch state (though exact SOLR-8621 details are still a work-in-progress). Hope that clarifies things. Thanks for your input.
          Hide
          hossman Hoss Man added a comment -

          I think the direction being taken in SOLR-8621 looks like a good idea in general for better configuring/modeling the true state of things internally as far as merge policies are concerned. I think it also makes sense to somewhat put this issue (SOLR-5730) on hold until some of those plumbing related questions are resolved.

          ultimately, i suspect using SortingMergePolicy is an advanced enough option that if it requires some non-trivial config then so be it – we can document what that config looks like and provide common recipes with clear instructions of what key knobs/values users have to consider to use those recipes.

          If, ultimately, we decide that the "full syntax" for specifying a merge policy with SortingMergePolicy warpper is overly verbose for basic users, and we also want some simplified way for users to say "use whatever default (merge policy) Solr thinks is best, but please sort docs on disk like this..." then that can always be added as syntactic sugar later.

          Show
          hossman Hoss Man added a comment - I think the direction being taken in SOLR-8621 looks like a good idea in general for better configuring/modeling the true state of things internally as far as merge policies are concerned. I think it also makes sense to somewhat put this issue ( SOLR-5730 ) on hold until some of those plumbing related questions are resolved. ultimately, i suspect using SortingMergePolicy is an advanced enough option that if it requires some non-trivial config then so be it – we can document what that config looks like and provide common recipes with clear instructions of what key knobs/values users have to consider to use those recipes. If, ultimately, we decide that the "full syntax" for specifying a merge policy with SortingMergePolicy warpper is overly verbose for basic users, and we also want some simplified way for users to say "use whatever default (merge policy) Solr thinks is best, but please sort docs on disk like this..." then that can always be added as syntactic sugar later.
          Hide
          shaie Shai Erera added a comment - - edited

          Hoss Man, this issue is now on hold until we finish with SOLR-8621 (see Christine Poerschke earlier comment).

          Jack Krupansky the intention of SOLR-8621 and this issue is to allow users to define compound merge policies in solrconfig.xml with ease. Currently, it's impossible and if you want to use something like UpgradeIndexMergePolicy, you have to create your own MP class which will delegate internally to UpgradeIndexMP. For users who still want to define simple merge policies like TieredMergePolicy, the structure in the XML will not be much different. Instead of:

          <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
            <int name="segmentsPerTier">42</int>
          </mergePolicy>
          

          They will need to specify

          <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
            <int name="segmentsPerTier">42</int>
          </mergePolicyFactory>
          

          The factory allows more advanced users to create whatever MP they want, rather easily. Also, with this change, we deprecate some settings that are now defined globally (like mergeFactor), and move them inside the <mergePolicyFactory> section, where they belong.

          Hope this clarifies it better.

          Show
          shaie Shai Erera added a comment - - edited Hoss Man , this issue is now on hold until we finish with SOLR-8621 (see Christine Poerschke earlier comment). Jack Krupansky the intention of SOLR-8621 and this issue is to allow users to define compound merge policies in solrconfig.xml with ease. Currently, it's impossible and if you want to use something like UpgradeIndexMergePolicy , you have to create your own MP class which will delegate internally to UpgradeIndexMP . For users who still want to define simple merge policies like TieredMergePolicy , the structure in the XML will not be much different. Instead of: <mergePolicy class= "org.apache.lucene.index.TieredMergePolicy" > < int name= "segmentsPerTier" >42</ int > </mergePolicy> They will need to specify <mergePolicyFactory class= "org.apache.solr.index.TieredMergePolicyFactory" > < int name= "segmentsPerTier" >42</ int > </mergePolicyFactory> The factory allows more advanced users to create whatever MP they want, rather easily. Also, with this change, we deprecate some settings that are now defined globally (like mergeFactor ), and move them inside the <mergePolicyFactory> section, where they belong. Hope this clarifies it better.
          Hide
          jkrupan Jack Krupansky added a comment -

          Let me try again... again, my apologies for not commenting much earlier before things got a bit complicated. Let me see if I have this straight:

          1. There are three related tickets: SOLR-4654, SOLR-5730, SOLR-8621.
          2. There are three key features of interest: UpgradeIndexMergePolicy, SortingMergePolicy , and EarlyTerminatingSortingCollector.
          3. The first ticket is kind of the umbrella.
          4. The second ticket is focused on the second and third features.
          5. The third ticket is the foundation for all three features.
          6. The third ticket has some user impact and delivers some additional minor benefits, but enabling those other three features is its true purpose.
          7. SortingMergePolicy and EarlyTerminatingSortingCollector are really two sides of a single feature, the index side and the query side of (in my words) "pre-sorted indexing".

          Now, I have only one remaining question area: Isn't the forceMerge method the only real benefit of UpgradeIndexMergePolicy? Is that purely for the Solr optimize option, or is there some intent to surface it for users some other way in Solr? Isn't it more of a one-time operation rather than something that should be in place for all merge operations? Or is it so cheap if not used that we should simply pre-configure it all the time?

          Show
          jkrupan Jack Krupansky added a comment - Let me try again... again, my apologies for not commenting much earlier before things got a bit complicated. Let me see if I have this straight: 1. There are three related tickets: SOLR-4654 , SOLR-5730 , SOLR-8621 . 2. There are three key features of interest: UpgradeIndexMergePolicy, SortingMergePolicy , and EarlyTerminatingSortingCollector. 3. The first ticket is kind of the umbrella. 4. The second ticket is focused on the second and third features. 5. The third ticket is the foundation for all three features. 6. The third ticket has some user impact and delivers some additional minor benefits, but enabling those other three features is its true purpose. 7. SortingMergePolicy and EarlyTerminatingSortingCollector are really two sides of a single feature, the index side and the query side of (in my words) "pre-sorted indexing". Now, I have only one remaining question area: Isn't the forceMerge method the only real benefit of UpgradeIndexMergePolicy? Is that purely for the Solr optimize option, or is there some intent to surface it for users some other way in Solr? Isn't it more of a one-time operation rather than something that should be in place for all merge operations? Or is it so cheap if not used that we should simply pre-configure it all the time?
          Hide
          shaie Shai Erera added a comment -

          Hi Jack Krupansky, your understanding is correct.

          The third ticket has some user impact and delivers some additional minor benefits

          It depends how you view it. I recently found that defining my own MP (when it wraps another one especially) is not so trivial in solrconfig.xml. You could argue that for the majority of the users, this is a very advanced feature to have, and I would agree. But for those who want to write more advanced and tailored merge policies, SOLR-8621 will allow them to do so quite easily. And the impact to the users who don't need it is marginal (see my previous comment).

          The reason I quoted that sentence is because of the words minor benefits – it's a matter of perspective of course – if you will be facing the need to write a special MP, then this feature will have huge benefits for you. I know you don't argue about the need, but just wanted to give my own perspective on this.

          As for UpgradeIndexMergePolicy you're right that it's a one-time operation and the main purpose is to allow you to upgrade all segments of your index to a newer format. You will likely only use it when you upgrade to 6.0 and you think (or know) that you still have segments created by 4.x code. Note, that that you upgraded your software to Solr 5.x doesn't mean that the existing indexes' segments were upgraded too!

          But even for this one-time operation, you currently have no way to define UpgradeIndexMergePolicy in a <mergePolicy> element, since it needs to take a wrapped MP which will actually decide which segments to merge together etc. (i.e. create the merge plan). So even for this one time operation, you'd need to have SOLR-8621 in place.

          I don't view SOLR-4654 as an umbrella ticket at all, but more of a duplicate of this one. SOLR-8621 lays the foundations for allowing users to define compound merge policies, and in this ticket we will add a SortingMergePolicyFactory. As you said, this only handles the indexing side of things, and there's still work to be done in order to make use of it during search (i.e. integrate EarlyTerminatingSortingCollector. Also, I'm pretty sure I saw in SolrIndexSearcher early termination by time (which sort of covers the second part of SOLR-4654), though I haven't seen it actually being used.

          Show
          shaie Shai Erera added a comment - Hi Jack Krupansky , your understanding is correct. The third ticket has some user impact and delivers some additional minor benefits It depends how you view it. I recently found that defining my own MP (when it wraps another one especially) is not so trivial in solrconfig.xml . You could argue that for the majority of the users, this is a very advanced feature to have, and I would agree. But for those who want to write more advanced and tailored merge policies, SOLR-8621 will allow them to do so quite easily. And the impact to the users who don't need it is marginal (see my previous comment). The reason I quoted that sentence is because of the words minor benefits – it's a matter of perspective of course – if you will be facing the need to write a special MP, then this feature will have huge benefits for you. I know you don't argue about the need, but just wanted to give my own perspective on this. As for UpgradeIndexMergePolicy you're right that it's a one-time operation and the main purpose is to allow you to upgrade all segments of your index to a newer format. You will likely only use it when you upgrade to 6.0 and you think (or know) that you still have segments created by 4.x code. Note, that that you upgraded your software to Solr 5.x doesn't mean that the existing indexes' segments were upgraded too! But even for this one-time operation, you currently have no way to define UpgradeIndexMergePolicy in a <mergePolicy> element, since it needs to take a wrapped MP which will actually decide which segments to merge together etc. (i.e. create the merge plan). So even for this one time operation, you'd need to have SOLR-8621 in place. I don't view SOLR-4654 as an umbrella ticket at all, but more of a duplicate of this one. SOLR-8621 lays the foundations for allowing users to define compound merge policies, and in this ticket we will add a SortingMergePolicyFactory . As you said, this only handles the indexing side of things, and there's still work to be done in order to make use of it during search (i.e. integrate EarlyTerminatingSortingCollector . Also, I'm pretty sure I saw in SolrIndexSearcher early termination by time (which sort of covers the second part of SOLR-4654 ), though I haven't seen it actually being used.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Resumed rebasing work for this ticket here, jira/solr-5730-master working-branch created (but final commit will still be attached as patch to this ticket for consistency and clarity), reviews, comments, etc. welcome as always.

          (Note that the working-branch commit marked as tentative would form part not of SOLR-5730 here but it or something equivalent will be committed as part of SOLR-8621.)

          Show
          cpoerschke Christine Poerschke added a comment - Resumed rebasing work for this ticket here, jira/solr-5730-master working-branch created (but final commit will still be attached as patch to this ticket for consistency and clarity), reviews, comments, etc. welcome as always. (Note that the working-branch commit marked as tentative would form part not of SOLR-5730 here but it or something equivalent will be committed as part of SOLR-8621 .)
          Hide
          cpoerschke Christine Poerschke added a comment -

          SOLR-5730-part1and2.patch attached against latest master/trunk. If there are no further comments or concerns and if all else goes well then that will be the final patch for this ticket, and I will commit it Thursday afternoon/evening or Friday morning London time.

          Show
          cpoerschke Christine Poerschke added a comment - SOLR-5730 -part1and2.patch attached against latest master/trunk. If there are no further comments or concerns and if all else goes well then that will be the final patch for this ticket, and I will commit it Thursday afternoon/evening or Friday morning London time.
          Hide
          shaie Shai Erera added a comment -

          Few comments about the patch:

          • In QueryComponent: if(existingSegmentTerminatedEarly == null) – can you add a space after the 'if'?
          • SortingMergePolicyFactory.getMergePolicy() calls args.invokeSetters(mp);, like UpgradeIndexMergePolicyFactory. I wonder if we can have a protected abstract getMergePolicyInstance(wrappedMP), so that WrapperMergePolicyFactory.getMergePolicy() implements it by calling this method followed by args.invokeSetters(mp);. What do you think?
          • SolrIndexSearcher: qr.setSegmentTerminatedEarly(earlyTerminatingSortingCollector.terminatedEarly()); – should we also set qr.partialResults?
          • DefaultSolrCoreState: you can change the to:
          public Sort getMergePolicySort() throws IOException {
            lock(iwLock.readLock());
            try {
              if (indexWriter != null) {
                final MergePolicy mergePolicy = indexWriter.getConfig().getMergePolicy();
                if (mergePolicy instanceof SortingMergePolicy) {
                  return ((SortingMergePolicy) mergePolicy).getSort();
                }
              }
            } finally {
              iwLock.readLock().unlock();
            }
          }
          
          • What's the purpose of enable="${solr.sortingMergePolicyFactory.enable:true}"?
          • I kind of feel like the test you added to TestMiniSolrCloudCluster doesn't belong in that class. Perhaps it should be in its own test class, inheriting from this class, or just using MiniSolrCloudCluster?
          • RandomForceMergePolicyFactory is not really related to this issue. Perhaps you should commit it separately?
          Show
          shaie Shai Erera added a comment - Few comments about the patch: In QueryComponent: if(existingSegmentTerminatedEarly == null) – can you add a space after the 'if'? SortingMergePolicyFactory.getMergePolicy() calls args.invokeSetters(mp); , like UpgradeIndexMergePolicyFactory . I wonder if we can have a protected abstract getMergePolicyInstance(wrappedMP) , so that WrapperMergePolicyFactory.getMergePolicy() implements it by calling this method followed by args.invokeSetters(mp); . What do you think? SolrIndexSearcher : qr.setSegmentTerminatedEarly(earlyTerminatingSortingCollector.terminatedEarly()); – should we also set qr.partialResults ? DefaultSolrCoreState : you can change the to: public Sort getMergePolicySort() throws IOException { lock(iwLock.readLock()); try { if (indexWriter != null ) { final MergePolicy mergePolicy = indexWriter.getConfig().getMergePolicy(); if (mergePolicy instanceof SortingMergePolicy) { return ((SortingMergePolicy) mergePolicy).getSort(); } } } finally { iwLock.readLock().unlock(); } } What's the purpose of enable="${solr.sortingMergePolicyFactory.enable:true}" ? I kind of feel like the test you added to TestMiniSolrCloudCluster doesn't belong in that class. Perhaps it should be in its own test class, inheriting from this class, or just using MiniSolrCloudCluster ? RandomForceMergePolicyFactory is not really related to this issue. Perhaps you should commit it separately?
          Hide
          cpoerschke Christine Poerschke added a comment -

          Thanks for the review comments Shai Erera!

          ... I wonder if we can have a protected abstract getMergePolicyInstance(wrappedMP) ...

          Yes, that makes sense and will make WrapperMergePolicyFactory similar to WrapperMergePolicyFactory i.e. both will have a protected abstract MergePolicy getMergePolicyInstance though with different arguments. Will action this under SOLR-8621 where it really belongs.

          RandomForceMergePolicyFactory is not really related to this issue. Perhaps you should commit it separately?

          Yes, I agree, since RandomForceMergePolicy already exists this would belong to SOLR-8621 also.

          Show
          cpoerschke Christine Poerschke added a comment - Thanks for the review comments Shai Erera ! ... I wonder if we can have a protected abstract getMergePolicyInstance(wrappedMP) ... Yes, that makes sense and will make WrapperMergePolicyFactory similar to WrapperMergePolicyFactory i.e. both will have a protected abstract MergePolicy getMergePolicyInstance though with different arguments. Will action this under SOLR-8621 where it really belongs. RandomForceMergePolicyFactory is not really related to this issue. Perhaps you should commit it separately? Yes, I agree, since RandomForceMergePolicy already exists this would belong to SOLR-8621 also.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 360051a414e291a7b3ffb5a0180a404fa18f3a6c in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=360051a ]

          SOLR-8621: factor out protected abstract WrapperMergePolicyFactory.getMergePolicyInstance method

          Here in SOLR-8621:

          • UpgradeIndexMergePolicyFactory extends WrapperMergePolicyFactory
          • (WrapperMergePolicyFactoryTest's) DefaultingWrapperMergePolicyFactory extends WrapperMergePolicyFactory

          Elsewhere in SOLR-5730:

          • SortingMergePolicyFactory will extend WrapperMergePolicyFactory
          Show
          jira-bot ASF subversion and git services added a comment - Commit 360051a414e291a7b3ffb5a0180a404fa18f3a6c in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=360051a ] SOLR-8621 : factor out protected abstract WrapperMergePolicyFactory.getMergePolicyInstance method Here in SOLR-8621 : UpgradeIndexMergePolicyFactory extends WrapperMergePolicyFactory (WrapperMergePolicyFactoryTest's) DefaultingWrapperMergePolicyFactory extends WrapperMergePolicyFactory Elsewhere in SOLR-5730 : SortingMergePolicyFactory will extend WrapperMergePolicyFactory
          Hide
          cpoerschke Christine Poerschke added a comment -

          SolrIndexSearcher: qr.setSegmentTerminatedEarly(earlyTerminatingSortingCollector.terminatedEarly()); – should we also set qr.partialResults?

          No, I don't think so: qr.partialResults is used to indicate that partial results were returned because the timeAllowed limit was exceeded. In comparison qr.segmentTerminatedEarly indicates that full results were returned and we made use of the EarlyTerminatingSortingCollector which early terminated the collecting of at least one segment. Had we not early-terminated for any of the segments then the results would still have been the same full results. — So why would anyone care about the qr.segmentTerminatedEarly flag in the response? If the flag is TRUE then the returned numFound will be potentially less than the numFound returned by an identical search that didn't segment-terminate-early. — As an aside, "partialResults" in the response can also occur if there is no timeAllowed specified but shards.tolerant=true was specified and there was a problem talking to at least one of the shards.

          Show
          cpoerschke Christine Poerschke added a comment - SolrIndexSearcher: qr.setSegmentTerminatedEarly(earlyTerminatingSortingCollector.terminatedEarly()); – should we also set qr.partialResults? No, I don't think so: qr.partialResults is used to indicate that partial results were returned because the timeAllowed limit was exceeded. In comparison qr.segmentTerminatedEarly indicates that full results were returned and we made use of the EarlyTerminatingSortingCollector which early terminated the collecting of at least one segment. Had we not early-terminated for any of the segments then the results would still have been the same full results. — So why would anyone care about the qr.segmentTerminatedEarly flag in the response? If the flag is TRUE then the returned numFound will be potentially less than the numFound returned by an identical search that didn't segment-terminate-early. — As an aside, "partialResults" in the response can also occur if there is no timeAllowed specified but shards.tolerant=true was specified and there was a problem talking to at least one of the shards.
          Hide
          cpoerschke Christine Poerschke added a comment -

          What's the purpose of enable="${solr.sortingMergePolicyFactory.enable:true}"?

          Good catch. This used to be the enable attribute in the <sortMerges> element alongside the <mergePolicy> element i.e.

          <mergePolicy class="..."/>
          <sortMerges enable="${solr.sortMerges.enable:false}">...</sortMerges>
          

          but now that we have a combined <mergePolicyFactory class="SortingMergePolicy">...</mergePolicyFactory> it doesn't really make sense anymore. I will remove it.

          I kind of feel like the test you added to TestMiniSolrCloudCluster doesn't belong in that class. Perhaps it should be in its own test class, inheriting from this class, or just using MiniSolrCloudCluster?

          I agree that TestMiniSolrCloudCluster has grown to have all sorts of test methods which could be broken up somehow. SOLR-7886 exists to 'factor out a TestMiniSolrCloudClusterBase class', initial discussion was in SOLR-7877, and so for now I'd like to keep the short-ish testSegmentTerminateEarly method in TestMiniSolrCloudCluster but I will try and see if the TestSegmentTerminateEarlyState helper class with the longer logic can be factored out into a class of its own.

          Show
          cpoerschke Christine Poerschke added a comment - What's the purpose of enable="${solr.sortingMergePolicyFactory.enable:true}"? Good catch. This used to be the enable attribute in the <sortMerges> element alongside the <mergePolicy> element i.e. <mergePolicy class= "..." /> <sortMerges enable= "${solr.sortMerges.enable: false }" >...</sortMerges> but now that we have a combined <mergePolicyFactory class="SortingMergePolicy">...</mergePolicyFactory> it doesn't really make sense anymore. I will remove it. I kind of feel like the test you added to TestMiniSolrCloudCluster doesn't belong in that class. Perhaps it should be in its own test class, inheriting from this class, or just using MiniSolrCloudCluster? I agree that TestMiniSolrCloudCluster has grown to have all sorts of test methods which could be broken up somehow. SOLR-7886 exists to 'factor out a TestMiniSolrCloudClusterBase class', initial discussion was in SOLR-7877 , and so for now I'd like to keep the short-ish testSegmentTerminateEarly method in TestMiniSolrCloudCluster but I will try and see if the TestSegmentTerminateEarlyState helper class with the longer logic can be factored out into a class of its own.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Attached revised SOLR-5730-part1and2.patch - new or further reviews and comments welcome as usual. If there are no further comments or concerns then will aim to commit the patch tomorrow/Friday towards end of London day.

          Show
          cpoerschke Christine Poerschke added a comment - Attached revised SOLR-5730 -part1and2.patch - new or further reviews and comments welcome as usual. If there are no further comments or concerns then will aim to commit the patch tomorrow/Friday towards end of London day.
          Hide
          shaie Shai Erera added a comment -

          Patch looks good! One minor comment – in SortingMergePolicyFactory.getMergePolicyInstance() I'd inline to return new Sorting ....

          Show
          shaie Shai Erera added a comment - Patch looks good! One minor comment – in SortingMergePolicyFactory.getMergePolicyInstance() I'd inline to return new Sorting ... .
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b2e83b6788b5b3c9ee6cf13be1cddec7f21014f4 in lucene-solr's branch refs/heads/branch_5x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2e83b6 ]

          SOLR-8621: factor out protected abstract WrapperMergePolicyFactory.getMergePolicyInstance method

          Here in SOLR-8621:

          • UpgradeIndexMergePolicyFactory extends WrapperMergePolicyFactory
          • (WrapperMergePolicyFactoryTest's) DefaultingWrapperMergePolicyFactory extends WrapperMergePolicyFactory

          Elsewhere in SOLR-5730:

          • SortingMergePolicyFactory will extend WrapperMergePolicyFactory
          Show
          jira-bot ASF subversion and git services added a comment - Commit b2e83b6788b5b3c9ee6cf13be1cddec7f21014f4 in lucene-solr's branch refs/heads/branch_5x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2e83b6 ] SOLR-8621 : factor out protected abstract WrapperMergePolicyFactory.getMergePolicyInstance method Here in SOLR-8621 : UpgradeIndexMergePolicyFactory extends WrapperMergePolicyFactory (WrapperMergePolicyFactoryTest's) DefaultingWrapperMergePolicyFactory extends WrapperMergePolicyFactory Elsewhere in SOLR-5730 : SortingMergePolicyFactory will extend WrapperMergePolicyFactory
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6a9d89309595ff668f851c17af4f3f97af7640c1 in lucene-solr's branch refs/heads/branch_5_5 from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6a9d893 ]

          SOLR-8621: factor out protected abstract WrapperMergePolicyFactory.getMergePolicyInstance method

          Here in SOLR-8621:

          • UpgradeIndexMergePolicyFactory extends WrapperMergePolicyFactory
          • (WrapperMergePolicyFactoryTest's) DefaultingWrapperMergePolicyFactory extends WrapperMergePolicyFactory

          Elsewhere in SOLR-5730:

          • SortingMergePolicyFactory will extend WrapperMergePolicyFactory
          Show
          jira-bot ASF subversion and git services added a comment - Commit 6a9d89309595ff668f851c17af4f3f97af7640c1 in lucene-solr's branch refs/heads/branch_5_5 from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6a9d893 ] SOLR-8621 : factor out protected abstract WrapperMergePolicyFactory.getMergePolicyInstance method Here in SOLR-8621 : UpgradeIndexMergePolicyFactory extends WrapperMergePolicyFactory (WrapperMergePolicyFactoryTest's) DefaultingWrapperMergePolicyFactory extends WrapperMergePolicyFactory Elsewhere in SOLR-5730 : SortingMergePolicyFactory will extend WrapperMergePolicyFactory
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 677779086c87db33406f3344736187fa10a30901 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6777790 ]

          SOLR-5730: Make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 677779086c87db33406f3344736187fa10a30901 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6777790 ] SOLR-5730 : Make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 805eeb932a872205cd6b43ab1a158c75ca939ea7 in lucene-solr's branch refs/heads/branch_5x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=805eeb9 ]

          SOLR-5730: Make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr.

          Cherry-picked commit 6777790 from master and resolved merge conflicts for
          solr/core/src/java/org/apache/solr/search/Query(Command|Result).java and SolrIndexSearcher.java
          (the former had previously been factored out from the latter in master but not branch_5x).

          Show
          jira-bot ASF subversion and git services added a comment - Commit 805eeb932a872205cd6b43ab1a158c75ca939ea7 in lucene-solr's branch refs/heads/branch_5x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=805eeb9 ] SOLR-5730 : Make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr. Cherry-picked commit 6777790 from master and resolved merge conflicts for solr/core/src/java/org/apache/solr/search/Query(Command|Result).java and SolrIndexSearcher.java (the former had previously been factored out from the latter in master but not branch_5x).
          Hide
          cpoerschke Christine Poerschke added a comment -

          Changes committed to master and cherry-picked to branch_5x (but not branch_5_5). Will resolve ticket on Monday assuming no further comments or test issues etc.

          Show
          cpoerschke Christine Poerschke added a comment - Changes committed to master and cherry-picked to branch_5x (but not branch_5_5). Will resolve ticket on Monday assuming no further comments or test issues etc.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f9df240e5ccbd9bd5687eec1d927d3eb517ce8e4 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9df240 ]

          SOLR-5730: rename TestSegmentTerminateEarlyState to SegmentTerminateEarlyTestState

          Show
          jira-bot ASF subversion and git services added a comment - Commit f9df240e5ccbd9bd5687eec1d927d3eb517ce8e4 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9df240 ] SOLR-5730 : rename TestSegmentTerminateEarlyState to SegmentTerminateEarlyTestState
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a48d85fdc7d62011c2514748b963accc3c102f1c in lucene-solr's branch refs/heads/branch_5x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a48d85f ]

          SOLR-5730: rename TestSegmentTerminateEarlyState to SegmentTerminateEarlyTestState

          Show
          jira-bot ASF subversion and git services added a comment - Commit a48d85fdc7d62011c2514748b963accc3c102f1c in lucene-solr's branch refs/heads/branch_5x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a48d85f ] SOLR-5730 : rename TestSegmentTerminateEarlyState to SegmentTerminateEarlyTestState
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 779120c6a48daf25e46a00f7e6981f8afcd0f3e8 in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=779120c ]

          SOLR-5730: Workaround non-working javadocs link (cannot refer to classes from packages in other modules that already exist in lucene-core.jar)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 779120c6a48daf25e46a00f7e6981f8afcd0f3e8 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=779120c ] SOLR-5730 : Workaround non-working javadocs link (cannot refer to classes from packages in other modules that already exist in lucene-core.jar)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b4811f63bc2f1af174886b379a628391fbcf2012 in lucene-solr's branch refs/heads/branch_5x from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b4811f6 ]

          SOLR-5730: Workaround non-working javadocs link (cannot refer to classes from packages in other modules that already exist in lucene-core.jar)

          Show
          jira-bot ASF subversion and git services added a comment - Commit b4811f63bc2f1af174886b379a628391fbcf2012 in lucene-solr's branch refs/heads/branch_5x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b4811f6 ] SOLR-5730 : Workaround non-working javadocs link (cannot refer to classes from packages in other modules that already exist in lucene-core.jar)

            People

            • Assignee:
              cpoerschke Christine Poerschke
              Reporter:
              cpoerschke Christine Poerschke
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development