Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5056

Fix AvroStorage writing enums

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.16.0
    • 0.17.0
    • None
    • Reviewed

    Description

      Issue is observable with latest (1.8.1) Avro since it has an extra check for enum types that the current 1.7.5 does not care about (see here: https://github.com/apache/avro/blob/release-1.8.1/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java#L163)

      This results in TestAvroStorage#testLoadRecordsWithEnums failing: Pig reads an Avro file with a schema containing (string,int,enum) this is then represented in Pig as (chararray,int,chararray) and then Pig writes this back to an Avro file with given schema (string,int,enum).

      java.lang.Exception: java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.AvroTypeException: Not an enum: GOOD
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
      Caused by: java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.AvroTypeException: Not an enum: GOOD
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:655)
      	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
      	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.AvroTypeException: Not an enum: GOOD
      	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
      	at org.apache.pig.impl.util.avro.AvroRecordWriter.write(AvroRecordWriter.java:115)
      	at org.apache.pig.impl.util.avro.AvroRecordWriter.write(AvroRecordWriter.java:51)
      	at org.apache.pig.builtin.AvroStorage.putNext(AvroStorage.java:520)
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)
      	... 18 more
      Caused by: org.apache.avro.AvroTypeException: Not an enum: GOOD
      	at org.apache.avro.generic.GenericDatumWriter.writeEnum(GenericDatumWriter.java:164)
      	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:106)
      	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
      	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
      	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
      	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
      	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
      	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
      	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
      	... 22 more
      

      Attachments

        1. PIG-5056.patch
          0.8 kB
          Ádám Szita

        Issue Links

          Activity

            People

              szita Ádám Szita
              szita Ádám Szita
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: