Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37473

BypassMergeSortShuffleWriter may loss data when disk is missing however catagory is present

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 2.4.0, 2.4.1, 2.4.8, 3.0.0, 3.2.0
    • None
    • Shuffle
    • None

    Description

      We think it has no data when the segment file not exists when all segment files produced by `BypassMergeSortShuffleWriter` is merging;

      However, `file.exists()` may rerurn `false` when then the disk which segment file in on is missing and the root catagory exists; the missing disk only lead `file.exists()` return `false` but no exception. The task will run in pease and with no current segment file written.

      The segment data will be ignored  and leading shuffle data loss.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yuhaiyang haiyangyu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: