Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14306

Hive Failed to read Parquet Files generated by SparkSQL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.1
    • None
    • CLI
    • None

    Description

      I'm trying to implement the following process:

      1. create a hive parquet table A use hive CLI
      2. create a external table B whose schema just like A, but point to a exist folder which contains one csv file in HDSF
      3. execute `insert into A select * from B` using SparkSQL
      4. query table A.

      wired thing happens in step 3 and 4。

      If the 'insert into' statement executed by SparkSQL(1.6.2),Hive CLI would throw me an Exception when querying table A
      ```
      Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://NEOInciteDataNode-1:8020/user/hive/warehouse/call_center/part-r-00000-b9b6962d-cbab-452b-835b-c10c6221b8fa.gz.parquet
      ```

      But SparkSQL can query table A without trouble...

      If the `insert` statement executed by Hive CLI, query table A in Hive CLI would be just fine...

      So am I doing something wrong, or this is just a bug?

      Attachments

        Activity

          People

            Unassigned Unassigned
            tenggyut Teng Yutong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: