Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-26

Parquet doesn't recognize the nested Array type in MAP as ArrayWritable.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When trying to insert hive data of type of MAP<string, array<int>> into Parquet, it throws the following error

      Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@c644ef1c
      at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:86)

      Problem is reproducible with following steps:
      Relevant test data is attached.

      1.
      CREATE TABLE test_hive (
      node string,
      stime string,
      stimeutc string,
      swver string,
      moid MAP <string,string>,
      pdfs MAP <string,array<int>>,
      utcdate string,
      motype string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY '|'
      COLLECTION ITEMS TERMINATED BY ','
      MAP KEYS TERMINATED BY '=';

      2.
      LOAD DATA LOCAL INPATH '/root/38388/test.dat' INTO TABLE test_hive;

      3.

      CREATE TABLE test_parquet(
      pdfs MAP <string,array<int>>
      )
      STORED AS PARQUET ;

      4.

      INSERT INTO TABLE test_parquet SELECT pdfs FROM test_hive;

        Attachments

        1. test.dat
          0.2 kB
          Mala Chikka Kempanna

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                mkempanna Mala Chikka Kempanna
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: