We continuously hit an issue with SHS after it runs for a while and have some REST API calls to it. SHS suddenly shows an empty home page with 0 application. It is caused by the unexpected JSON data returned from rest call "api/v1/applications?limit=8000". This REST call returns the home page html codes instead of list of application summary. Some other REST call which asks for application detailed information also returns home page html codes. But there are still some working REST calls. We have to restart SHS to solve the issue.
We attached remote debugger to the problematic process and checked the attached jetty handlers tree in the web server. We found that the jetty handler added by "attachHandler(ApiRootResource.getServletHandler(this))" is not in the tree as well as some other handlers. Without the root resource servlet handler, SHS will not work correctly serving both UI and REST calls. SHS will directly return the HistoryServerPage html to user as it cannot find handlers to handle the request.
Spark History Server has to attachSparkUI in order to serve user requests. The application SparkUI getting attached when the application details data gets loaded into Guava Cache. While attaching SparkUI, SHS will add attach all jetty handlers into the current web service. But while the data gets cleared out from Guava Cache, SHS will detach all the application's SparkUI jetty handlers. Due to the asynchronous feature in Guava Cache, the clear out from cache is not synchronized with loading into cache. The actual clear out in Guava Cache which triggers detachSparkUI might be detaching the handlers while the attachSparkUI is attaching jetty handlers.
After adding synchronization between attachSparkUI and detachSparkUI in history server, this issue never happens again.