Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-7542

Change XML format to 1.1 to add support for serializing additional characters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.20.2
    • None
    • conf
    • None
    • Incompatible change

    Description

      Feature added by this Jira has a problem while setting up some of the invalid xml characters e.g. ctrl-A e.g. mapred.textoutputformat.separator = "\u0001"

      e,g,
      String delim = "\u0001";
      Conf.set("mapred.textoutputformat.separator", delim);

      Job client serializes the jobconf with mapred.textoutputformat.separator set to "\u0001" (ctrl-A) and problem happens when it is de-serialized (read back) by job tracker, where it encounters invalid xml character.

      The test for this feature public : testFormatWithCustomSeparator() does not serialize the jobconf after adding the separator as ctrl-A and hence does not detect the specific problem.

      Here is an exception:

      08/12/06 01:40:50 INFO mapred.FileInputFormat: Total input paths to process : 1
      org.apache.hadoop.ipc.RemoteException: java.io.IOException:
      java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML
      character.
      at
      org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:961)
      at
      org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:864)
      at
      org.apache.hadoop.conf.Configuration.getProps(Configuration.java:832)
      at org.apache.hadoop.conf.Configuration.get(Configuration.java:291)
      at
      org.apache.hadoop.mapred.JobConf.getJobPriority(JobConf.java:1163)
      at
      org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:179)
      at
      org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

      at org.apache.hadoop.ipc.Client.call(Client.java:715)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
      at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
      at

      Attachments

        1. HADOOP-7542-v1.patch
          3 kB
          Michael Katzenellenbogen
        2. MAPREDUCE-109.patch
          2 kB
          Christopher Egner
        3. MAPREDUCE-109-v2.patch
          2 kB
          Michael Katzenellenbogen
        4. MAPREDUCE-109-v3.patch
          2 kB
          Michael Katzenellenbogen
        5. MAPREDUCE-109-v4.patch
          2 kB
          Michael Katzenellenbogen

        Issue Links

          Activity

            People

              michaelk Michael Katzenellenbogen
              vgogate Suhas
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: