Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5289

WriteStatus RDD is recalculated in cluster

    XMLWordPrintableJSON

Details

    Description

      Step:

      spark-submit \
      --class org.apache.hudi.utilities.HoodieClusteringJob \
      --conf spark.driver.memory=40G \
      --conf spark.executor.instances=20 \
      --conf spark.executor.memory=40G \
      --conf spark.executor.cores=4 \
      hudi-utilities-bundle_2.11-0.12.0.jar \
      --props clusteringjob.properties \
      --mode scheduleAndExecute \
      --base-path xxx \
      --table-name xxx \
      --spark-memory 40g 

      The following are the two stages about the job, they are all related to the calculation of WriteStatus, but some tasks in stage96 have been recalculated which taking more than ten minutes

      here is stage 65

      here is stage 96

      Attachments

        1. image-2022-11-29-10-24-08-853.png
          37 kB
          Xinyu Zou
        2. image-2022-11-29-10-25-29-546.png
          141 kB
          Xinyu Zou
        3. image-2022-11-29-10-26-22-050.png
          125 kB
          Xinyu Zou

        Activity

          People

            zouxxyy Xinyu Zou
            zouxxyy Xinyu Zou
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: