There is date alteration when loading date from one table to another in hive through spark. This happens when Hive is on a remote machine with timezone different than the one on which Spark is running. This happens only when the Source table format is 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
Below are the steps to produce the issue:
1. Create two tables as below in hive which has a timezone, say in, EST
2. Copy hive-site.xml to spark-2.2.1-bin-hadoop2.7/conf folder, so that when you create sqlContext for hive it connects to your remote hive server.
3. Start your spark-shell on some other machine whose timezone is different than that of Hive, say, PDT
4. Execute below code:
5. Now navigate to hive and check the contents of the TARGET table (t_tgt). The dob field will have incorrect values.
Is this a known issue? Is there any work around on this? Can it be fixed?
Thanks & regards,