Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9303

Parquet files are written with incorrect definition levels

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.1
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      The definition level, which determines which level of nesting is NULL, appears to always be n or n-1, where n is the maximum definition level. This means that only the innermost level of nesting can be NULL. This is only relevant for Parquet files. For example:

      CREATE TABLE text_tbl (a STRUCT<b:STRUCT<c:INT>>)
      STORED AS TEXTFILE;
      
      INSERT OVERWRITE TABLE text_tbl
      SELECT IF(false, named_struct("b", named_struct("c", 1)), NULL)
      FROM tbl LIMIT 1;
      
      CREATE TABLE parq_tbl
      STORED AS PARQUET
      AS SELECT * FROM text_tbl;
      
      SELECT * FROM text_tbl;
      => NULL # right
      
      SELECT * FROM parq_tbl;
      => {"b":{"c":null}} # wrong
      

        Attachments

        1. HIVE-9303.1.patch
          5 kB
          Sergio Peña
        2. HIVE-9303.1.patch
          5 kB
          Brock Noland

          Activity

            People

            • Assignee:
              spena Sergio Peña
              Reporter:
              skye Skye Wanderman-Milne
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: