Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33424

Doubts about the use of the "DiskBlockObjectWriter#revertPartialWritesAndClose" method in Spark Code

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 3.1.0
    • None
    • Spark Core
    • None

    Description

      Although there are some similar discussions in SPARK-17562, but I still have some questions.

      I found "DiskBlockObjectWriter#revertPartialWritesAndClose" method is called in 5 places in Spark Code,

      Two of the call points are in the "ExternalAppendOnlyMap#spillMemoryIteratorToDisk" method, two similar call points are in the "ExternalSorter#spillMemoryIteratorToDisk" method, and the last  is in the "BypassMergeSortShuffleWriter#stop" method.

      Let's take the use of "ExternalAppendOnlyMap#spillMemoryIteratorToDisk" as an example:

       

      var success = false
      try {
        while (inMemoryIterator.hasNext) {
          ...
      
          if (objectsWritten == serializerBatchSize) {
            flush()
          }
        }
        if (objectsWritten > 0) {
          flush()
          writer.close()
        } else {
          writer.revertPartialWritesAndClose() // The first call point 
        }
        success = true
      } finally {
        if (!success) {
          writer.revertPartialWritesAndClose() // The second call point
          if (file.exists()) {
            if (!file.delete()) {
              logWarning(s"Error deleting ${file}")
            }
          }
        }
      }
      
      

       

      There are two questions about the above code:

      1. Can the first call "writer.revertPartialWritesAndClose() " be replaced by "writer.close()"?

      I think there are two possibilities to get into this branch:

      • One possibility is all data has been called flush(), I think we can call "writer.close()" directly because all data has been flushed, "committedPosition"  of DiskBlockObjectWriter should eq file.length.
      • Another is inMemoryIterator is empty, in this scenario whether calling "revertPartialWritesAndClose()" or calling "close()", the file.length is both 0, the test suite "commit() and close() without ever opening or writing" in DiskBlockObjectWriterSuite can prove that

      And I try to use "writer.close()"  instead of "writer.revertPartialWritesAndClose() " , all UTs in core module passed, so what is the specific scenario that must call the "revertPartialWritesAndClose() " method?

      2. For the 2nd call point, the main goal is to roll back writeMetrics in DiskBlockObjectWriter? 
      If we want to delete this file,  Is the truncate operation in the "revertPartialWritesAndClose() " method really necessary?In this scenario, should we just roll back writeMetrics without truncate file to reduce one disk operation?
       

      Attachments

        Activity

          People

            Unassigned Unassigned
            LuciferYang Yang Jie
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: