Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.8.0, 2.7.3
-
None
-
Reviewed
Description
While running ATS stress tests, 15 concurrent ATS reads (python thread which gives ws/v1/time/TEZ_DAG_ID, ws/v1/time/TEZ_VERTEX_DI?primaryFilter=TEZ_DAG_ID:<dag_id> etc) calls.
Note: Summary store for ATSv1.5 is RLD, but as we for each dag/application ATS also creates leveldb cache when vertex/task/taskattempts information is queried from ATS.
Getting following type of excpetion very frequently in ATS logs :-
2016-07-23 00:01:56,089 [1517798697@qtp-1198158701-850] INFO org.apache.hadoop.service.AbstractService: Service LeveldbCache.timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832 failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: already held by process
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: already held by process
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.serviceInit(LevelDBCacheTimelineStore.java:108)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:113)
at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1021)
at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:936)
at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:989)
at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1041)
at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117)
at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)