Details
-
Question
-
Status: Resolved
-
Minor
-
Resolution: Invalid
-
3.1.0
-
None
-
None
Description
Although there are some similar discussions in SPARK-17562, but I still have some questions.
I found "DiskBlockObjectWriter#revertPartialWritesAndClose" method is called in 5 places in Spark Code,
Two of the call points are in the "ExternalAppendOnlyMap#spillMemoryIteratorToDisk" method, two similar call points are in the "ExternalSorter#spillMemoryIteratorToDisk" method, and the last is in the "BypassMergeSortShuffleWriter#stop" method.
Let's take the use of "ExternalAppendOnlyMap#spillMemoryIteratorToDisk" as an example:
var success = false try { while (inMemoryIterator.hasNext) { ... if (objectsWritten == serializerBatchSize) { flush() } } if (objectsWritten > 0) { flush() writer.close() } else { writer.revertPartialWritesAndClose() // The first call point } success = true } finally { if (!success) { writer.revertPartialWritesAndClose() // The second call point if (file.exists()) { if (!file.delete()) { logWarning(s"Error deleting ${file}") } } } }
There are two questions about the above code:
1. Can the first call "writer.revertPartialWritesAndClose() " be replaced by "writer.close()"?
I think there are two possibilities to get into this branch:
- One possibility is all data has been called flush(), I think we can call "writer.close()" directly because all data has been flushed, "committedPosition" of DiskBlockObjectWriter should eq file.length.
- Another is inMemoryIterator is empty, in this scenario whether calling "revertPartialWritesAndClose()" or calling "close()", the file.length is both 0, the test suite "commit() and close() without ever opening or writing" in DiskBlockObjectWriterSuite can prove that
And I try to use "writer.close()" instead of "writer.revertPartialWritesAndClose() " , all UTs in core module passed, so what is the specific scenario that must call the "revertPartialWritesAndClose() " method?
2. For the 2nd call point, the main goal is to roll back writeMetrics in DiskBlockObjectWriter?
If we want to delete this file, Is the truncate operation in the "revertPartialWritesAndClose() " method really necessary?In this scenario, should we just roll back writeMetrics without truncate file to reduce one disk operation?