Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Not A Problem
-
1.6.0
-
None
-
Hadoop 2.6
Hive 0.14
Parquet 1.6
SPARK 1.6.1
Scala 2.11
Description
When I write Parquet files from Spark Job, and try to read it in Hive as an External Table , I get Null Pointer Exception. After further analysis , I found I had some Null values in my transformation(used Dataset and DataFrame API's) before saving to parquet. These 2 fields which contains NULL are float data types. When I removed these two columns from the parquet datasets, I was able to read it in hive. Contrastingly , with all NULL columns I was able to read it Hive when I write my job to ORC format.
When a datatype is anything other than String , which is completely empty(NULL) written in parquet is not been able to read by Hive and throws NP Exception.