Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48309

Stop am retry, in situations where some errors and retries may not be successful

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • 4.0.0
    • YARN

    Description

      In yarn cluster mode, spark.yarn.maxAppAttempts will be configured. In our production environment, it is configured as 2 If the first execution fails, AM will retry. However, in some scenarios, even attempting a second task may fail.

      For example:

      org. apache. park. SQL AnalysisException: Table or view not found: test.testxxxx_xxxxx; Line 1 pos 14;
      Project
      +-Unresolved Relationship [bigdata_qa, testxxxxx_xxxxx], [], false

       

      Other example:
      Caused by: org. apache. hadoop. hdfs. protocol NSQuotaExceededException: The NameSpace quota (directories and files) of directory/tmp/xxx_file/xxxx is exceeded: quota=1000000 file count=1000001

      Would it be more appropriate to try capturing these exceptions and stopping retry?

       

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              guihuawen guihuawen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: