Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8419

Hive doesn't properly write NULL values in Parquet files when the type is struct<...>.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.1
    • Fix Version/s: None
    • Component/s: File Formats
    • Labels:
      None

      Description

      Hive doesn't seem to be able to write NULL values in a column of type "struct". Instead, it replaces them by empty objects (= non NULL objects containing only NULL values).

      Here is a short example demonstrating the issue. We start with a small Avro table "avro_table".

       SELECT  * from avro_table 
      mycol
      struct<field1:string,field2:double>
      {"field1":"blabla","field2":1.0}
      {"field1":"blabla","field2":2.0}
      NULL
      {"field1":"blabla","field2":4.0}
      {"field1":"blabla","field2":5.0}

      As you can see here, the third row contains a NULL cell. Then, let's copy it using Hive (INSERT OVERWRITE ...) into a Parquet table named "parquet_table".

      Finally, when you try to display it:

       SELECT  * from parquet_table 
      mycol
      struct<field1:string,field2:double>
      {"field1":"blabla","field2":1.0}
      {"field1":"blabla","field2":2.0}
      {"field1":null,"field2":null}
      {"field1":"blabla","field2":4.0}
      {"field1":"blabla","field2":5.0}

      I tried to generate a (correct) Parquet file using our software (Dataiku), and Hive had no problem reading null values, even when the column type was "struct".

      Consequently, I suspect the bug to be located in the Parquet writer code.

      This bug also recursively propagates to nested types. For instance a NULL cell of type

       struct<field1:struct<field3:string>,field2:double> 

      will be become

       {"field1":{"field3":null},"field2":null} 

      when written in a Parquet file.

        Attachments

          Activity

            People

            • Assignee:
              spena Sergio Peña
              Reporter:
              Akryus Frédéric TERRAZZONI
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: