Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35816

Spark read write with multiple Hadoop HA cluster limitation

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.3
    • Fix Version/s: None
    • Component/s: Spark Submit

      Description

      I have two Hadoop HA cluster: h1 and h2. Want to read from h1-HDFS and write to h2-HDFS using spark. As both HDFS are in HA, so need to set spark hadoop configuration with HDFS details

      spark.sparkContext().hadoopConfiguration().set(<HADOOP_RPC_ADDRESS_AND_DETAILS>)

      So with a single spark session job one of the Hadoop configuration will overwrite with write details and will try to read from that configuration, resulting in no file/path found.

      Similar thing will happen with HDFS to external Hive write(I am writing in external Hive table owned HDFS), but more keen on above problem solution

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              respondanupam Anupam Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: