Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.2
-
None
-
None
Description
EventLoggingListener.scala
private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) { val eventJson = JsonProtocol.sparkEventToJson(event) // scalastyle:off println writer.foreach(_.println(compact(render(eventJson)))) // scalastyle:on println if (flushLogger) { writer.foreach(_.flush()) hadoopDataStream.foreach(ds => ds.getWrappedStream match { case wrapped: DFSOutputStream => wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH)) case _ => ds.hflush() }) }
There are events which come with flushLogger=true and go through the underlying stream flush, Here I tried running apps with disabling the flush/hsync/hflush for all events and see that there is significant improvement in the app completion time and also there are no event drops, posting more details in the comments section.
Attachments
Issue Links
- is duplicated by
-
SPARK-24787 Events being dropped at an alarming rate due to hsync being slow for eventLogging
- Resolved