Solr
  1. Solr
  2. SOLR-7512

SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.1
    • Fix Version/s: 5.2.1, 5.3, 6.0
    • Component/s: contrib - MapReduce
    • Labels:
      None

      Description

      Sometime after Solr 4.9, the `MapReduceIndexerTool` got busted because invalid `solr.xml` contents were being written to the solr home dir zip. My guess is that a 5.0 change made the invalid file start to matter.

      The error manifests as:

      Error: java.lang.IllegalStateException: Failed to initialize record writer for org.apache.solr.hadoop.MapReduceIndexerTool/SolrMapper, attempt_1430953999892_0012_r_000001_1
              at org.apache.solr.hadoop.SolrRecordWriter.<init>(SolrRecordWriter.java:126)
              at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:163)
              at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569)
              at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:643)
              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394)
              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
      Caused by: org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; Premature end of file.
              at org.apache.solr.core.Config.<init>(Config.java:156)
              at org.apache.solr.core.SolrXmlConfig.fromInputStream(SolrXmlConfig.java:127)
              at org.apache.solr.core.SolrXmlConfig.fromFile(SolrXmlConfig.java:110)
              at org.apache.solr.core.SolrXmlConfig.fromSolrHome(SolrXmlConfig.java:138)
              at org.apache.solr.core.CoreContainer.<init>(CoreContainer.java:142)
              at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(SolrRecordWriter.java:162)
              at org.apache.solr.hadoop.SolrRecordWriter.<init>(SolrRecordWriter.java:119)
              ... 9 more
      Caused by: org.xml.sax.SAXParseException; Premature end of file.
              at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
              at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
              at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
              at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
              at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
              at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
              at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
              at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
              at org.apache.solr.core.Config.<init>(Config.java:145)
              ... 15 more
      
      

      The last version that I've successfully used `MapReduceIndexerTool` was 4.9, and I verified that this patch resolves the issue for me (testing on 5.1). I spent a couple hours trying to write a simple test case to exhibit the error, but I haven't quite figured out how to deal with the

      java.security.AccessControlException: java.io.FilePermission ...

      errors.

      Pull request for bugfix here

      1. SOLR-7512.patch
        66 kB
        Mark Miller
      2. SOLR-7512.patch
        61 kB
        Mark Miller
      3. SOLR-7512.patch
        11 kB
        Adam McElwee
      4. SOLR-7512.patch
        12 kB
        Adam McElwee

        Activity

        Hide
        Adam McElwee added a comment -

        I've updated the PR to include the test case, but I'm still wrestling with the security manager, and have to disable it to successfully execute the test.

        Show
        Adam McElwee added a comment - I've updated the PR to include the test case, but I'm still wrestling with the security manager, and have to disable it to successfully execute the test.
        Hide
        Uwe Schindler added a comment -

        There are some problems with the patch:

        -        Path targetFile = destDir.resolve(entry.getName());
        -        
        +        Path targetFile = new File(destDir.toFile(), entry.getName()).toPath();
        +
        

        This is a no-go with Lucene/Solr: java.io.File is not allowed to be aused anywhere in Lucene code.

        Show
        Uwe Schindler added a comment - There are some problems with the patch: - Path targetFile = destDir.resolve(entry.getName()); - + Path targetFile = new File(destDir.toFile(), entry.getName()).toPath(); + This is a no-go with Lucene/Solr: java.io.File is not allowed to be aused anywhere in Lucene code.
        Hide
        Mark Miller added a comment -

        but I haven't quite figured out how to deal with the

        java.security.AccessControlException: java.io.FilePermission ...
        errors.

        That's a known current problem - a couple tests have to be run via IDE or without a security manager because a Hadoop piece tries to write in an illegal location for tests.

        Show
        Mark Miller added a comment - but I haven't quite figured out how to deal with the java.security.AccessControlException: java.io.FilePermission ... errors. That's a known current problem - a couple tests have to be run via IDE or without a security manager because a Hadoop piece tries to write in an illegal location for tests.
        Hide
        Mark Miller added a comment -

        Though if you are writing a new test, perhaps it's just your new code as Uwe points out.

        There are two tests that are currently generally skipped because of what I mention above - they are likely the tests that would catch this.

        Show
        Mark Miller added a comment - Though if you are writing a new test, perhaps it's just your new code as Uwe points out. There are two tests that are currently generally skipped because of what I mention above - they are likely the tests that would catch this.
        Hide
        Adam McElwee added a comment -

        Patch updated to remove usage of `java.io.File`.

        For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`.

        Show
        Adam McElwee added a comment - Patch updated to remove usage of `java.io.File`. For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`.
        Hide
        Uwe Schindler added a comment -

        For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`.

        The reason could be a "incorrectly packed ZIP file". There are some ZIP files out there that use backslashes as separator. Maybe the one you uses had this problem.

        Show
        Uwe Schindler added a comment - For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`. The reason could be a "incorrectly packed ZIP file". There are some ZIP files out there that use backslashes as separator. Maybe the one you uses had this problem.
        Hide
        Adam McElwee added a comment -

        Hmm, possibly. The zip in question is the one created as part of the existing MRIndexerTool in `SolrOutputFormat`. A quick look at it shows that it's simply doing substring manipulation for creating the zip entries. Seems a bit questionable. At any rate, the hadoop `FileUtil.unZip` unpacks it w/ no issues.

        Show
        Adam McElwee added a comment - Hmm, possibly. The zip in question is the one created as part of the existing MRIndexerTool in `SolrOutputFormat`. A quick look at it shows that it's simply doing substring manipulation for creating the zip entries. Seems a bit questionable. At any rate, the hadoop `FileUtil.unZip` unpacks it w/ no issues.
        Hide
        Mark Miller added a comment -

        I've been working on getting this in. First dealing with a bunch of issues around the tests that would have already picked this up but can't run with a security manager - they couldnt run without additional dependencies and fixes due to recent hadoop upgrades it seems. Once I finished getting that straightened out, I'll pull in the new test.

        Show
        Mark Miller added a comment - I've been working on getting this in. First dealing with a bunch of issues around the tests that would have already picked this up but can't run with a security manager - they couldnt run without additional dependencies and fixes due to recent hadoop upgrades it seems. Once I finished getting that straightened out, I'll pull in the new test.
        Hide
        Adam McElwee added a comment -

        Mark Miller, if there's something that you think a newb solr contributor can help out w/, just let me know.

        Show
        Adam McElwee added a comment - Mark Miller , if there's something that you think a newb solr contributor can help out w/, just let me know.
        Hide
        Mark Miller added a comment -

        Okay, this patch gets things back up to speed. I'll look at integrating the new test next.

        Show
        Mark Miller added a comment - Okay, this patch gets things back up to speed. I'll look at integrating the new test next.
        Hide
        Mark Miller added a comment -

        I'm having some trouble with the new test, so I'll commit the standard cleanup first - I want to make sure that get's into 5.2.1.

        Recent hadoop lib updates allow us to properly run a couple tests that required no security manager to run - that should prevent this from popping up again.

        We do actually have to write the solr.xml file out - 6x requires a solr.xml file.

        Show
        Mark Miller added a comment - I'm having some trouble with the new test, so I'll commit the standard cleanup first - I want to make sure that get's into 5.2.1. Recent hadoop lib updates allow us to properly run a couple tests that required no security manager to run - that should prevent this from popping up again. We do actually have to write the solr.xml file out - 6x requires a solr.xml file.
        Hide
        ASF subversion and git services added a comment -

        Commit 1684494 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1684494 ]

        SOLR-7512: SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.

        Show
        ASF subversion and git services added a comment - Commit 1684494 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1684494 ] SOLR-7512 : SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.
        Hide
        ASF subversion and git services added a comment -

        Commit 1684495 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1684495 ]

        SOLR-7512: SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.

        Show
        ASF subversion and git services added a comment - Commit 1684495 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1684495 ] SOLR-7512 : SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.
        Hide
        ASF subversion and git services added a comment -

        Commit 1684509 from Mark Miller in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1684509 ]

        SOLR-7512: SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.

        Show
        ASF subversion and git services added a comment - Commit 1684509 from Mark Miller in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1684509 ] SOLR-7512 : SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Adam McElwee
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development