Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23470

org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: Web UI
    • Labels:
      None
    • Target Version/s:

      Description

      I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" when accessing All Jobs page. The stack dump says it was running "org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".

      "SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x00007fc15b0a3000 nid=0x8dc runnable [0x00007fc0ce9f8000]
         java.lang.Thread.State: RUNNABLE
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown Source)
      	at java.util.TimSort.binarySort(TimSort.java:296)
      	at java.util.TimSort.sort(TimSort.java:239)
      	at java.util.Arrays.sort(Arrays.java:1512)
      	at java.util.ArrayList.sort(ArrayList.java:1460)
      	at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
      	at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
      	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
      	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
      	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
      	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
      	at org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
      	at org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
      	at org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
      	at org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
      	at org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
      	at org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
      	at org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
      	at org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.immutable.List.foreach(List.scala:381)
      	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
      	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
      	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      	at org.apache.spark.ui.jobs.JobDataSource.<init>(AllJobsPage.scala:412)
      	at org.apache.spark.ui.jobs.JobPagedTable.<init>(AllJobsPage.scala:504)
      	at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
      	at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
      	at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
      	at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
      	at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
      	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
      	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
      	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
      	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
      	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
      	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
      	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
      	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
      

      According to the heap dump, there are 954 JobDataWrapper and 54690 StageDataWrapper. It's obvious that the UI will be slow since we need to sort 54690 items for 954 jobs.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vanzin Marcelo Vanzin
                Reporter:
                zsxwing Shixiong Zhu
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: