Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-26

Parquet doesn't recognize the nested Array type in MAP as ArrayWritable.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      When trying to insert hive data of type of MAP<string, array<int>> into Parquet, it throws the following error

      Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@c644ef1c
      at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:86)

      Problem is reproducible with following steps:
      Relevant test data is attached.

      1.
      CREATE TABLE test_hive (
      node string,
      stime string,
      stimeutc string,
      swver string,
      moid MAP <string,string>,
      pdfs MAP <string,array<int>>,
      utcdate string,
      motype string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY '|'
      COLLECTION ITEMS TERMINATED BY ','
      MAP KEYS TERMINATED BY '=';

      2.
      LOAD DATA LOCAL INPATH '/root/38388/test.dat' INTO TABLE test_hive;

      3.

      CREATE TABLE test_parquet(
      pdfs MAP <string,array<int>>
      )
      STORED AS PARQUET ;

      4.

      INSERT INTO TABLE test_parquet SELECT pdfs FROM test_hive;

      Attachments

        1. test.dat
          0.2 kB
          Mala Chikka Kempanna

        Issue Links

          Activity

            People

              rdblue Ryan Blue
              mkempanna Mala Chikka Kempanna
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: