Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7512

SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.1
    • Fix Version/s: 5.2.1, 5.3, 6.0
    • Component/s: contrib - MapReduce
    • Labels:
      None

      Description

      Sometime after Solr 4.9, the `MapReduceIndexerTool` got busted because invalid `solr.xml` contents were being written to the solr home dir zip. My guess is that a 5.0 change made the invalid file start to matter.

      The error manifests as:

      Error: java.lang.IllegalStateException: Failed to initialize record writer for org.apache.solr.hadoop.MapReduceIndexerTool/SolrMapper, attempt_1430953999892_0012_r_000001_1
              at org.apache.solr.hadoop.SolrRecordWriter.<init>(SolrRecordWriter.java:126)
              at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:163)
              at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569)
              at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:643)
              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394)
              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
      Caused by: org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; Premature end of file.
              at org.apache.solr.core.Config.<init>(Config.java:156)
              at org.apache.solr.core.SolrXmlConfig.fromInputStream(SolrXmlConfig.java:127)
              at org.apache.solr.core.SolrXmlConfig.fromFile(SolrXmlConfig.java:110)
              at org.apache.solr.core.SolrXmlConfig.fromSolrHome(SolrXmlConfig.java:138)
              at org.apache.solr.core.CoreContainer.<init>(CoreContainer.java:142)
              at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(SolrRecordWriter.java:162)
              at org.apache.solr.hadoop.SolrRecordWriter.<init>(SolrRecordWriter.java:119)
              ... 9 more
      Caused by: org.xml.sax.SAXParseException; Premature end of file.
              at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
              at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
              at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
              at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
              at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
              at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
              at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
              at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
              at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
              at org.apache.solr.core.Config.<init>(Config.java:145)
              ... 15 more
      
      

      The last version that I've successfully used `MapReduceIndexerTool` was 4.9, and I verified that this patch resolves the issue for me (testing on 5.1). I spent a couple hours trying to write a simple test case to exhibit the error, but I haven't quite figured out how to deal with the

      java.security.AccessControlException: java.io.FilePermission ...

      errors.

      Pull request for bugfix here

      1. SOLR-7512.patch
        66 kB
        Mark Miller
      2. SOLR-7512.patch
        61 kB
        Mark Miller
      3. SOLR-7512.patch
        11 kB
        Adam McElwee
      4. SOLR-7512.patch
        12 kB
        Adam McElwee

        Activity

        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1684509 from Mark Miller in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1684509 ]

        SOLR-7512: SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1684509 from Mark Miller in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1684509 ] SOLR-7512 : SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1684495 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1684495 ]

        SOLR-7512: SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1684495 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1684495 ] SOLR-7512 : SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1684494 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1684494 ]

        SOLR-7512: SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1684494 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1684494 ] SOLR-7512 : SolrOutputFormat creates an invalid solr.xml in the solr home zip for MapReduceIndexerTool.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        I'm having some trouble with the new test, so I'll commit the standard cleanup first - I want to make sure that get's into 5.2.1.

        Recent hadoop lib updates allow us to properly run a couple tests that required no security manager to run - that should prevent this from popping up again.

        We do actually have to write the solr.xml file out - 6x requires a solr.xml file.

        Show
        markrmiller@gmail.com Mark Miller added a comment - I'm having some trouble with the new test, so I'll commit the standard cleanup first - I want to make sure that get's into 5.2.1. Recent hadoop lib updates allow us to properly run a couple tests that required no security manager to run - that should prevent this from popping up again. We do actually have to write the solr.xml file out - 6x requires a solr.xml file.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Okay, this patch gets things back up to speed. I'll look at integrating the new test next.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Okay, this patch gets things back up to speed. I'll look at integrating the new test next.
        Hide
        amcelwee Adam McElwee added a comment -

        Mark Miller, if there's something that you think a newb solr contributor can help out w/, just let me know.

        Show
        amcelwee Adam McElwee added a comment - Mark Miller , if there's something that you think a newb solr contributor can help out w/, just let me know.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        I've been working on getting this in. First dealing with a bunch of issues around the tests that would have already picked this up but can't run with a security manager - they couldnt run without additional dependencies and fixes due to recent hadoop upgrades it seems. Once I finished getting that straightened out, I'll pull in the new test.

        Show
        markrmiller@gmail.com Mark Miller added a comment - I've been working on getting this in. First dealing with a bunch of issues around the tests that would have already picked this up but can't run with a security manager - they couldnt run without additional dependencies and fixes due to recent hadoop upgrades it seems. Once I finished getting that straightened out, I'll pull in the new test.
        Hide
        amcelwee Adam McElwee added a comment -

        Hmm, possibly. The zip in question is the one created as part of the existing MRIndexerTool in `SolrOutputFormat`. A quick look at it shows that it's simply doing substring manipulation for creating the zip entries. Seems a bit questionable. At any rate, the hadoop `FileUtil.unZip` unpacks it w/ no issues.

        Show
        amcelwee Adam McElwee added a comment - Hmm, possibly. The zip in question is the one created as part of the existing MRIndexerTool in `SolrOutputFormat`. A quick look at it shows that it's simply doing substring manipulation for creating the zip entries. Seems a bit questionable. At any rate, the hadoop `FileUtil.unZip` unpacks it w/ no issues.
        Hide
        thetaphi Uwe Schindler added a comment -

        For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`.

        The reason could be a "incorrectly packed ZIP file". There are some ZIP files out there that use backslashes as separator. Maybe the one you uses had this problem.

        Show
        thetaphi Uwe Schindler added a comment - For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`. The reason could be a "incorrectly packed ZIP file". There are some ZIP files out there that use backslashes as separator. Maybe the one you uses had this problem.
        Hide
        amcelwee Adam McElwee added a comment -

        Patch updated to remove usage of `java.io.File`.

        For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`.

        Show
        amcelwee Adam McElwee added a comment - Patch updated to remove usage of `java.io.File`. For some reason that method in `TestUtil` wasn't correctly unpacking the zip and using relative paths. Maybe that's another issue, in itself. I switched to the hadoop fs `FileUtil.unZip`.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Though if you are writing a new test, perhaps it's just your new code as Uwe points out.

        There are two tests that are currently generally skipped because of what I mention above - they are likely the tests that would catch this.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Though if you are writing a new test, perhaps it's just your new code as Uwe points out. There are two tests that are currently generally skipped because of what I mention above - they are likely the tests that would catch this.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        but I haven't quite figured out how to deal with the

        java.security.AccessControlException: java.io.FilePermission ...
        errors.

        That's a known current problem - a couple tests have to be run via IDE or without a security manager because a Hadoop piece tries to write in an illegal location for tests.

        Show
        markrmiller@gmail.com Mark Miller added a comment - but I haven't quite figured out how to deal with the java.security.AccessControlException: java.io.FilePermission ... errors. That's a known current problem - a couple tests have to be run via IDE or without a security manager because a Hadoop piece tries to write in an illegal location for tests.
        Hide
        thetaphi Uwe Schindler added a comment -

        There are some problems with the patch:

        -        Path targetFile = destDir.resolve(entry.getName());
        -        
        +        Path targetFile = new File(destDir.toFile(), entry.getName()).toPath();
        +
        

        This is a no-go with Lucene/Solr: java.io.File is not allowed to be aused anywhere in Lucene code.

        Show
        thetaphi Uwe Schindler added a comment - There are some problems with the patch: - Path targetFile = destDir.resolve(entry.getName()); - + Path targetFile = new File(destDir.toFile(), entry.getName()).toPath(); + This is a no-go with Lucene/Solr: java.io.File is not allowed to be aused anywhere in Lucene code.
        Hide
        amcelwee Adam McElwee added a comment -

        I've updated the PR to include the test case, but I'm still wrestling with the security manager, and have to disable it to successfully execute the test.

        Show
        amcelwee Adam McElwee added a comment - I've updated the PR to include the test case, but I'm still wrestling with the security manager, and have to disable it to successfully execute the test.

          People

          • Assignee:
            markrmiller@gmail.com Mark Miller
            Reporter:
            amcelwee Adam McElwee
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development