Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10086

Hive throws error when accessing Parquet file schema using field name match

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      When Hive table schema contains a portion of the schema of a Parquet file, then the access to the values should work if the field names match the schema. This does not work when a struct<> data type is in the schema, and the Hive schema contains just a portion of the struct elements. Hive throws an error instead.

      This is the example and how to reproduce:

      First, create a parquet table, and add some values on it:

      CREATE TABLE test1 (id int, name string, address struct<number:int,street:string,zip:string>) STORED AS PARQUET;
      
      INSERT INTO TABLE test1 SELECT 1, 'Roger', named_struct('number',8600,'street','Congress Ave.','zip','87366') FROM srcpart LIMIT 1;
      

      Note: srcpart could be any table. It is just used to leverage the INSERT statement.

      The above table example generates the following Parquet file schema:

      message hive_schema {
        optional int32 id;
        optional binary name (UTF8);
        optional group address {
          optional int32 number;
          optional binary street (UTF8);
          optional binary zip (UTF8);
        }
      }
      

      Afterwards, I create a table that contains just a portion of the schema, and load the Parquet file generated above, a query will fail on that table:

      CREATE TABLE test1 (name string, address struct<street:string>) STORED AS PARQUET;
      
      LOAD DATA LOCAL INPATH '/tmp/HiveGroup.parquet' OVERWRITE INTO TABLE test1;
      
      hive> SELECT name FROM test1;
      OK
      Roger
      Time taken: 0.071 seconds, Fetched: 1 row(s)
      
      hive> SELECT address FROM test1;
      OK
      Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable
      Time taken: 0.085 seconds
      

      I would expect that Parquet can access the matched names, but Hive throws an error instead.

        Attachments

        1. HIVE-10086.5.patch
          26 kB
          Sergio Peña
        2. HiveGroup.parquet
          0.7 kB
          Sergio Peña

          Issue Links

            Activity

              People

              • Assignee:
                spena Sergio Peña
                Reporter:
                spena Sergio Peña
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: