Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19814

Spark History Server Out Of Memory / Extreme GC

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.6.1, 2.0.0, 2.1.0
    • Fix Version/s: None
    • Component/s: Web UI
    • Labels:
      None
    • Environment:

      Spark History Server (we've run it on several different Hadoop distributions)

      Description

      Spark History Server runs out of memory, gets into GC thrash and eventually becomes unresponsive. This seems to happen more quickly with heavy use of the REST API. We've seen this with several versions of Spark.

      Running with the following settings (spark 2.1):
      spark.history.fs.cleaner.enabled true
      spark.history.fs.cleaner.interval 1d
      spark.history.fs.cleaner.maxAge 7d
      spark.history.retainedApplications 500

      We will eventually get errors like:
      17/02/25 05:02:19 WARN ServletHandler:ยท
      javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
      at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
      at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
      at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
      at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
      at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
      at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
      at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
      at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
      at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
      at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
      at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
      at org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529)
      at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
      at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
      at org.spark_project.jetty.server.Server.handle(Server.java:499)
      at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
      at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
      at org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
      at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
      at org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
      at java.lang.Thread.run(Thread.java:745)

      Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
      at org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148)
      at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110)
      at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244)
      at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49)
      at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
      at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158)
      at org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178)
      at org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109)
      at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109)
      at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
      at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
      at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
      at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
      at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92)
      at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61)
      at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197)
      at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318)
      at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
      at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:267)

      at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
      at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
      at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
      at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)

      In our case we see memory usage gradually increase over perhaps 2 days, then level off near max heap size (4G in our case), then often within 12-24 hours GC activity will start to increase, and will result in more and more frequent errors, as in the stack trace above.

        Attachments

        1. SparkHistoryCPUandRAM.png
          89 kB
          Simon King

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                simonpk Simon King
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: