Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Profiling a Spark job spilling large amount of intermediate data we found that significant portion of time is being spent in DiskObjectWriter.updateBytesWritten function. Looking at the code (https://github.com/sitalkedia/spark/blob/master/core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala#L206), we see that the function is being called too frequently to update the number of bytes written to disk. We should reduce the frequency to avoid this.