Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35816

Spark read write with multiple Hadoop HA cluster limitation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.3
    • None
    • Spark Submit

    Description

      I have two Hadoop HA cluster: h1 and h2. Want to read from h1-HDFS and write to h2-HDFS using spark. As both HDFS are in HA, so need to set spark hadoop configuration with HDFS details

      spark.sparkContext().hadoopConfiguration().set(<HADOOP_RPC_ADDRESS_AND_DETAILS>)

      So with a single spark session job one of the Hadoop configuration will overwrite with write details and will try to read from that configuration, resulting in no file/path found.

      Similar thing will happen with HDFS to external Hive write(I am writing in external Hive table owned HDFS), but more keen on above problem solution

      Attachments

        Activity

          People

            Unassigned Unassigned
            respondanupam Anupam Jain
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: