Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3330

Propagate additional config parameters when running MR jobs via Tez.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0, 0.8.5
    • None

    Description

      I tried running the simple avro M/R job MapredColorCount, that I found in the examples of avro release 1.7.7.
      It failed with the following trace:

      errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge
              at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
              at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
              at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:744)
      Caused by: java.lang.NullPointerException
              at java.io.StringReader.<init>(StringReader.java:50)
              at org.apache.avro.Schema$Parser.parse(Schema.java:917)
              at org.apache.avro.Schema.parse(Schema.java:966)
              at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
              at org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
              at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
              at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
              at org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
              at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
              at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
              at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
              ... 6 more
      

      Digging a bit I saw that during shuffle Tez can't access some of the configuration properties of the job. In our example it is the avro.output.schema that is missing.

      With some more complicated code I could get one step further and a similar issue happened when the valuesIterator for the reducer was being built:

      java.lang.NullPointerException
      at java.io.StringReader.<init>(StringReader.java:50)
      at org.apache.avro.Schema$Parser.parse(Schema.java:917)
      at org.apache.avro.Schema.parse(Schema.java:966)
      at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
      at org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
      at org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
      at org.apache.tez.runtime.library.common.ValuesIterator.<init>(ValuesIterator.java:80)
      at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
      

      I am using HDP2.4, Tez 0.7.0, avro 1.7.4

      Attachments

        1. TEZ-3330.temp.patch
          11 kB
          Siddharth Seth
        2. TEZ-3330.temp.2.patch
          13 kB
          Siddharth Seth
        3. TEZ-3330.01.patch
          29 kB
          Siddharth Seth

        Activity

          People

            sseth Siddharth Seth
            manuel.godbert Manuel Godbert
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: