Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.3
-
None
Description
I have two Hadoop HA cluster: h1 and h2. Want to read from h1-HDFS and write to h2-HDFS using spark. As both HDFS are in HA, so need to set spark hadoop configuration with HDFS details
spark.sparkContext().hadoopConfiguration().set(<HADOOP_RPC_ADDRESS_AND_DETAILS>)
So with a single spark session job one of the Hadoop configuration will overwrite with write details and will try to read from that configuration, resulting in no file/path found.
Similar thing will happen with HDFS to external Hive write(I am writing in external Hive table owned HDFS), but more keen on above problem solution