Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-7614

Reloading configuration when using imputstream resources results in org.xml.sax.SAXParseException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Cannot Reproduce
    • 0.21.0
    • 2.7.0
    • conf
    • None

    Description

      When using an inputstream as a resource for configuration, reloading this configuration will throw the following exception:

      Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException: Premature end of file.
      at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1576)
      at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1445)
      at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1381)
      at org.apache.hadoop.conf.Configuration.get(Configuration.java:569)
      ...
      Caused by: org.xml.sax.SAXParseException: Premature end of file.
      at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
      at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
      at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
      at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1504)
      ... 4 more

      To reproduce see following testcode:
      Configuration conf = new Configuration();
      ByteArrayInputStream bais = new ByteArrayInputStream("<configuration></configuration>".getBytes());
      conf.addResource(bais);
      System.out.println(conf.get("blah"));
      conf.addResource("core-site.xml"); //just add a named resource, doesn't matter which one
      System.out.println(conf.get("blah"));

      Allowing inputstream resources is flexible, but in cases such as this in can lead to difficult to debug problems.

      What do you think is the best solution? We could:
      A) reset the inputstream after it is read instead of closing it (but what to do when the stream does not support marking?)
      B) leave it up to the client (for example make sure you implement close() so that it resets the steam)
      C) when reading the inputstream for the first time, cache or wrap the contents somehow so that is can be read multiple times (let's at least document it)
      D) remove inputstream method altogether
      e) something else?

      For now I have attached a patch for solution A.

      Attachments

        1. HADOOP-7614-v2.patch
          0.9 kB
          Ferdy
        2. HADOOP-7614-v1.patch
          0.8 kB
          Ferdy

        Activity

          People

            Unassigned Unassigned
            ferdy.g Ferdy
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: