Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47678

Got fetch failed exception when new executor reused same ip address from a previously killed executor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.4.2, 3.4.0, 3.4.1, 3.5.0, 3.5.1
    • None
    • Shuffle
    • This only happens on Kubernetes, where same ip address can be re-used for new executor pod.

    Description

      This is an edge case which caused Spark on Kubernetes getting fetch failed exception when new executor reused same ip address from a previously killed executor.

      The new executor checks shuffle block ip address and compares it with its own host address. If the two ip addresses are the same, the new executor will assume the block on its own local disk and try to read it locally. This causes failure since the block is actually on the previously killed executor which happened to have same ip address.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bobyangbo BoYang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: