Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-647

Null Pointer Exception in Hive upon reading Parquet

    XMLWordPrintableJSON

    Details

      Description

      When I write Parquet files from Spark Job, and try to read it in Hive as an External Table , I get Null Pointer Exception. After further analysis , I found I had some Null values in my transformation(used Dataset and DataFrame API's) before saving to parquet. These 2 fields which contains NULL are float data types. When I removed these two columns from the parquet datasets, I was able to read it in hive. Contrastingly , with all NULL columns I was able to read it Hive when I write my job to ORC format.
      When a datatype is anything other than String , which is completely empty(NULL) written in parquet is not been able to read by Hive and throws NP Exception.

        Attachments

        1. Screen Shot 2016-06-24 at 11.03.56 AM.png
          136 kB
          Mahadevan Sudarsanan
        2. Screen Shot 2016-06-24 at 11.02.50 AM.png
          71 kB
          Mahadevan Sudarsanan
        3. Screen Shot 2016-06-24 at 11.01.55 AM.png
          22 kB
          Mahadevan Sudarsanan
        4. Screen Shot 2016-06-24 at 11.01.46 AM.png
          903 kB
          Mahadevan Sudarsanan

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              msudarsanan Mahadevan Sudarsanan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: