Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-387

TwoLevelListWriter does not handle null values in array

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.8.0, 1.8.1
    • 1.9.0, 1.8.2
    • None
    • None

    Description

      parquet-mr is unable to handle the following avro schema:

      {"type": "record",
       "namespace": "com.cloudera.impala",
       "name": "table_3",
       "fields": [
         {"name": "field_6", "type":
           {"type": "array", "items": ["null",
             {"type": "map", "values": ["null", "string"]}]}}]}
      

      If map is null, the following exception happens:

      java.lang.reflect.InvocationTargetException
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.NullPointerException
      	at parquet.avro.AvroWriteSupport.writeMap(AvroWriteSupport.java:185)
      	at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:277)
      	at parquet.avro.AvroWriteSupport.access$400(AvroWriteSupport.java:48)
      	at parquet.avro.AvroWriteSupport$TwoLevelListWriter.writeCollection(AvroWriteSupport.java:473)
      	at parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:322)
      	at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
      	at parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:169)
      	at parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:144)
      	at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
      	at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)
      	at com.cloudera.impala.datagenerator.RandomNestedDataGenerator.writeFile(RandomNestedDataGenerator.java:69)
      	at com.cloudera.impala.datagenerator.RandomNestedDataGenerator.main(RandomNestedDataGenerator.java:284)
      

      The cause is probably because if there is a null value in the array, the TwoLevelListWriter does not check if an element is null: https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L456

      Attachments

        Issue Links

          Activity

            People

              rdblue Ryan Blue
              tarasbob Taras Bobrovytsky
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: