Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-83

Hive Query failed if the data type is array<string> with parquet files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: parquet-mr
    • Labels:

      Description

      • Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg:
        { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, "null" ] }
        
      • Created External Hive table with the Array type as below,
        create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
        alter table paraArray add partition(partitionid=1) location '/testPara';
        
      • Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception.
        Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable
        at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
        at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
        at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
        at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
        at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)
        ]
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
        ... 8 more
        

      This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281.

        Attachments

        1. HIVE-7850.1.patch
          6 kB
          Sathish
        2. HIVE-7850.2.patch
          11 kB
          Sathish
        3. HIVE-7850.patch
          6 kB
          Sathish

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                vallurisathish Sathish
              • Votes:
                1 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: