Solr
  1. Solr
  2. SOLR-4941

useCompoundFile default has changed, simple config option no longer seems to work

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038

      1. infostream.txt
        57 kB
        Hoss Man
      2. SOLR-4941.patch
        15 kB
        Hoss Man
      3. SOLR-4941.patch
        13 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          I understand what happend now..

          when simon asked on the mailing list for help reviewing the solr changes affected by LUCENE-5038 i didn't fully understand the scope of the change, and only focused on how it affected the existing MergePolicy settings (SOLR-4934) – but i only noticed that "setUseCompoundFile" had been removed from the merge policies in facvor us only using the ratio – i didn't realize that setUseCompoundFile was actaully moved to IndexWriterConfig.

          i'll work up a patch to make the existing solr settings apply to the IndexWriterConfig.

          Show
          Hoss Man added a comment - I understand what happend now.. when simon asked on the mailing list for help reviewing the solr changes affected by LUCENE-5038 i didn't fully understand the scope of the change, and only focused on how it affected the existing MergePolicy settings ( SOLR-4934 ) – but i only noticed that "setUseCompoundFile" had been removed from the merge policies in facvor us only using the ratio – i didn't realize that setUseCompoundFile was actaully moved to IndexWriterConfig. i'll work up a patch to make the existing solr settings apply to the IndexWriterConfig.
          Hide
          Hoss Man added a comment -

          Patch that improves the tests and updates the logic added in SOLR-4934 so that if there is explicit useCompoundFile configuration as an init arg for a (known) MergePolicy we pass that to the IndexWriterConfig's setUseCompoundFile method and log a warning instead of just ignoring it.

          patch also removes the warnings about the simple legacy <useCompoundFile> syntax since that actually makes sense now that it's a setting on IWC.

          I've also updated the tests to inspect the useCompoundFile on the IWC as well as checking the results of adding some segments.

          there is still a failure in testTieredMergePolicyConfig where (as i understand it from talking to mike on IRC) the merged segment after the optimize command should not be in CFS format because of the noCFSRatio setting – but the merged segment is still in CFS. i've attached the infostream log from running "ant test -Dtestcase=TestMergePolicyConfig -Dtests.method=testTieredMergePolicyConfig" to see if it helps illuminate the problem ... i suspect it's either a test bug because i still missunderstand something about how the MergePolicy settings come into play, or a genuine bug in the lower level TieredMP code – i don't see how it could be specific to the solr config parsing logic since the IWC and TMP getters say they got the expected settings.

          (NOTE: the patch includes a nocommit in solrconfig-mergepolicy.xml to turn off the infostream before committing)

          Show
          Hoss Man added a comment - Patch that improves the tests and updates the logic added in SOLR-4934 so that if there is explicit useCompoundFile configuration as an init arg for a (known) MergePolicy we pass that to the IndexWriterConfig's setUseCompoundFile method and log a warning instead of just ignoring it. patch also removes the warnings about the simple legacy <useCompoundFile> syntax since that actually makes sense now that it's a setting on IWC. I've also updated the tests to inspect the useCompoundFile on the IWC as well as checking the results of adding some segments. there is still a failure in testTieredMergePolicyConfig where (as i understand it from talking to mike on IRC) the merged segment after the optimize command should not be in CFS format because of the noCFSRatio setting – but the merged segment is still in CFS. i've attached the infostream log from running "ant test -Dtestcase=TestMergePolicyConfig -Dtests.method=testTieredMergePolicyConfig" to see if it helps illuminate the problem ... i suspect it's either a test bug because i still missunderstand something about how the MergePolicy settings come into play, or a genuine bug in the lower level TieredMP code – i don't see how it could be specific to the solr config parsing logic since the IWC and TMP getters say they got the expected settings. (NOTE: the patch includes a nocommit in solrconfig-mergepolicy.xml to turn off the infostream before committing)
          Hide
          Michael McCandless added a comment -

          Indeed I can see that TMP has noCFSRatio=0.6, and two segments are flushed & turned into CFS, then those two segments are merged, and then the merged segment is turned into a CFS.

          I think this means that the merged segment's files (pre-CFS) are < 0.6 the size of the two flushed CFS segments ... e.g. maybe the CFS headers of the first 2 segments are tipping the scale? Try indexing more docs for each segment maybe?

          Show
          Michael McCandless added a comment - Indeed I can see that TMP has noCFSRatio=0.6, and two segments are flushed & turned into CFS, then those two segments are merged, and then the merged segment is turned into a CFS. I think this means that the merged segment's files (pre-CFS) are < 0.6 the size of the two flushed CFS segments ... e.g. maybe the CFS headers of the first 2 segments are tipping the scale? Try indexing more docs for each segment maybe?
          Hide
          Hoss Man added a comment -

          maybe the CFS headers of the first 2 segments are tipping the scale? Try indexing more docs for each segment maybe?

          yeah .. i guess i was just naive in considering 0.6 a low enough threshold.

          i increased the size of the docs and the number of docs per segment – and when that still didn't work i also decreased the ratio to 0.1 and that seemed to do the trick.

          updated patch fixes the test, removes the nocommit, and updates the "upgrading" instructions in CHANGES.txt (still need an explicit "Bug Fix" entry though)

          still running more test iters, but i think this is pretty good.

          Show
          Hoss Man added a comment - maybe the CFS headers of the first 2 segments are tipping the scale? Try indexing more docs for each segment maybe? yeah .. i guess i was just naive in considering 0.6 a low enough threshold. i increased the size of the docs and the number of docs per segment – and when that still didn't work i also decreased the ratio to 0.1 and that seemed to do the trick. updated patch fixes the test, removes the nocommit, and updates the "upgrading" instructions in CHANGES.txt (still need an explicit "Bug Fix" entry though) still running more test iters, but i think this is pretty good.
          Hide
          Hoss Man added a comment -

          Committed revision 1494837.
          Committed revision 1494839.

          Show
          Hoss Man added a comment - Committed revision 1494837. Committed revision 1494839.
          Hide
          Steve Rowe added a comment -

          Bulk close resolved 4.4 issues

          Show
          Steve Rowe added a comment - Bulk close resolved 4.4 issues
          Hide
          David Smiley added a comment -

          Before I go file a new issue, I at least want to comment here to see what you think Hoss Man and of course anyone else who may notice ( Christine Poerschke touched this stuff too I think?). I claim that "useCompoundFile" in solrconfig.xml (directly within <indexConfig>) is named in a way that is misleading, and Solr chooses a default that is not ideal.

          <useCompoundFile> is only about flushed segments from newly indexed documents, yet its setting name doesn't reflect it's limited to just those new segments. We don't even say this in comments or the ref guide (I'll happily fix this). I can understand how this came to be, it's mimicking a like-setting in Lucene IndexWriter which arguably is also misleading for the same reason. But I don't think we need to mimick the name Lucene chose. I could propose alternative names but I just want to throw this statement out there to see if it resonates or not.

          Secondly, note that Solr defaults to useCompoundFile=false; Lucene does true. Given a merge policy with a noCFSRatio of > 0 (by default we use TieredMergePolicy which uses 0.1), merged segments with fewer than 10% of the index will become CFS. This is a schizophrenic configuration! Segments start off non-CFS, then get merged to CFS (assuming it's still small enough), and then eventually become non-CFS again. It flipped back then forth. That's weird. Instead, I propose that only if noCFSRatio is 0, then default useCompoundFile to false since it'll never flip-flop. Otherwise, default useCompoundFile=true (which given default MP settings, this will generally be the case).

          Make sense?

          Show
          David Smiley added a comment - Before I go file a new issue, I at least want to comment here to see what you think Hoss Man and of course anyone else who may notice ( Christine Poerschke touched this stuff too I think?). I claim that "useCompoundFile" in solrconfig.xml (directly within <indexConfig>) is named in a way that is misleading, and Solr chooses a default that is not ideal. <useCompoundFile> is only about flushed segments from newly indexed documents, yet its setting name doesn't reflect it's limited to just those new segments. We don't even say this in comments or the ref guide (I'll happily fix this). I can understand how this came to be, it's mimicking a like-setting in Lucene IndexWriter which arguably is also misleading for the same reason. But I don't think we need to mimick the name Lucene chose. I could propose alternative names but I just want to throw this statement out there to see if it resonates or not. Secondly, note that Solr defaults to useCompoundFile=false; Lucene does true. Given a merge policy with a noCFSRatio of > 0 (by default we use TieredMergePolicy which uses 0.1), merged segments with fewer than 10% of the index will become CFS. This is a schizophrenic configuration! Segments start off non-CFS, then get merged to CFS (assuming it's still small enough), and then eventually become non-CFS again. It flipped back then forth. That's weird. Instead, I propose that only if noCFSRatio is 0, then default useCompoundFile to false since it'll never flip-flop. Otherwise, default useCompoundFile=true (which given default MP settings, this will generally be the case). Make sense?
          Hide
          Hoss Man added a comment -

          I honestly don't even remember this issue, let alone what the (expected) impact of the changes were or what cascading sets of decisions led to the situation we were in. More then likeley i was just trying to keep existing configs working as closely as possible to how they worked before new stuff was added to Lucene.

          I would absolutely open a new issue proposing whatever new configs/defaults you think are best, and I would absolutely rope Christine Poerschke in – i personally would defer to her judgement on what makes the most sense for these types of things before i would trust my own judgement.

          Show
          Hoss Man added a comment - I honestly don't even remember this issue, let alone what the (expected) impact of the changes were or what cascading sets of decisions led to the situation we were in. More then likeley i was just trying to keep existing configs working as closely as possible to how they worked before new stuff was added to Lucene. I would absolutely open a new issue proposing whatever new configs/defaults you think are best, and I would absolutely rope Christine Poerschke in – i personally would defer to her judgement on what makes the most sense for these types of things before i would trust my own judgement.
          Hide
          Christine Poerschke added a comment -

          David Smiley and Hoss Man - thanks for including me on this. In SOLR-8621 myself and Shai Erera changed MergePolicy related config in SolrIndexConfig.java where <useCompoundFile> is configured also.

          The <useCompoundFile> setting name not reflecting exact use i.e. new segments vs. segments created through merge, I agree that's not ideal and that the ref guide at least mentioning it would be good. Though unless we can come up with a very much clearer alternative name, would there be value in continuing to mimic/match the Lucene setting name?

          I agree, the 'initially uncompounded, then compounded-via-merge, then uncompound-via-merge-since-too-large' lifecycle seems unusual as a default.

          Making the default useCompoundFile value depend on the merge policy's noCFSRatio value, would that not be confusing for users? (The XML paths in solrconfig.xml are indexConfig/useCompoundFile and indexConfig/mergePolicyFactory/int(noCFSRatio) respectively.)

          How about changing the overall default (to useCompoundFile=true) and logging WARNings for unusual combinations? Unusual combinations would be:

          • useCompoundFile==false and noCFSRatio>0.0
          • useCompoundFile==true and noCFSRatio==0.0
          Show
          Christine Poerschke added a comment - David Smiley and Hoss Man - thanks for including me on this. In SOLR-8621 myself and Shai Erera changed MergePolicy related config in SolrIndexConfig.java where <useCompoundFile> is configured also. The <useCompoundFile> setting name not reflecting exact use i.e. new segments vs. segments created through merge, I agree that's not ideal and that the ref guide at least mentioning it would be good. Though unless we can come up with a very much clearer alternative name, would there be value in continuing to mimic/match the Lucene setting name? I agree, the 'initially uncompounded, then compounded-via-merge, then uncompound-via-merge-since-too-large' lifecycle seems unusual as a default. Making the default useCompoundFile value depend on the merge policy's noCFSRatio value, would that not be confusing for users? (The XML paths in solrconfig.xml are indexConfig/useCompoundFile and indexConfig/mergePolicyFactory/int(noCFSRatio) respectively.) How about changing the overall default (to useCompoundFile=true) and logging WARNings for unusual combinations? Unusual combinations would be: useCompoundFile==false and noCFSRatio>0.0 useCompoundFile==true and noCFSRatio==0.0
          Hide
          David Smiley added a comment -

          Sounds great Christine Poerschke. I'll file an issue referencing this discussion and we'll continue from there.

          Show
          David Smiley added a comment - Sounds great Christine Poerschke . I'll file an issue referencing this discussion and we'll continue from there.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development