Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25645

Add provision to disable EventLoggingListener default flush/hsync/hflush for all events

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.3.2
    • None
    • Spark Core
    • None

    Description

      EventLoggingListener.scala
      private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) {
          val eventJson = JsonProtocol.sparkEventToJson(event)
          // scalastyle:off println
          writer.foreach(_.println(compact(render(eventJson))))
          // scalastyle:on println
          if (flushLogger) {
            writer.foreach(_.flush())
            hadoopDataStream.foreach(ds => ds.getWrappedStream match {
              case wrapped: DFSOutputStream => wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH))
              case _ => ds.hflush()
            })
          }
      

      There are events which come with flushLogger=true and go through the underlying stream flush, Here I tried running apps with disabling the flush/hsync/hflush for all events and see that there is significant improvement in the app completion time and also there are no event drops, posting more details in the comments section.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              devaraj Devaraj Kavali
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: