Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13981

Job event log empty on Spark History Server

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.33.0
    • 2.38.0
    • runner-spark
    • None

    Description

      After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after complete shows empty data on History server.

      The problem seems to be a race condition and 2 

      22/02/22 10:51:11 INFO EventLoggingListener: Logging events to hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
      ...
      22/02/22 10:51:41 INFO EventLoggingListener: Logging events to hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1

      At the end failure:

      22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
      (serviceOption=None,
       services=List(),
       started=false)
      22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
      java.io.IOException: Target log file already exists (hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
      	at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
      	at org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
      	at org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
      	at scala.Option.foreach(Option.scala:274)
      	at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
      	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
      	at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
      	at org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
      	at org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
      	at org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
      	at org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
      	at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
      	at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
      	at com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
      	at scala.util.Try$.apply(Try.scala:213)
      	at com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
      	at com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
      	at com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
      	at com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)

      This ends up with very empty file in HDFS and empty job details in history server.

       

      Problem seems to by introduced by this change:

      https://github.com/apache/beam/pull/14409

      Why Beam runs concurrent event listener to what spark is doing internally? When I roll back change for SparkRunner, problem disappear for me.

      I am running native SparkRunner with Spark 2.4.4

       

      Attachments

        Issue Links

          Activity

            People

              iemejia Ismaël Mejía
              jvilcek Jozef Vilcek
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m