Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5432

Lock already held by another process while LevelDB cache store creation for dag

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.8.0, 2.7.3
    • 2.8.0, 3.0.0-alpha1
    • timelineserver
    • None
    • Reviewed

    Description

      While running ATS stress tests, 15 concurrent ATS reads (python thread which gives ws/v1/time/TEZ_DAG_ID, ws/v1/time/TEZ_VERTEX_DI?primaryFilter=TEZ_DAG_ID:<dag_id> etc) calls.

      Note: Summary store for ATSv1.5 is RLD, but as we for each dag/application ATS also creates leveldb cache when vertex/task/taskattempts information is queried from ATS.

      Getting following type of excpetion very frequently in ATS logs :-
      2016-07-23 00:01:56,089 [1517798697@qtp-1198158701-850] INFO org.apache.hadoop.service.AbstractService: Service LeveldbCache.timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832 failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: already held by process
      org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: already held by process
      at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
      at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
      at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
      at org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.serviceInit(LevelDBCacheTimelineStore.java:108)
      at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
      at org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:113)
      at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1021)
      at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:936)
      at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:989)
      at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1041)
      at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
      at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
      at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117)
      at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
      at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
      at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
      at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
      at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
      at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
      at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
      at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
      at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
      at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
      at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
      at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
      at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
      at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
      at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
      at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
      at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
      at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
      at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
      at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
      at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
      at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
      at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)

      Attachments

        1. YARN-5432-trunk.001.patch
          6 kB
          Li Lu
        2. YARN-5432-trunk.002.patch
          15 kB
          Li Lu
        3. YARN-5432-trunk.003.patch
          16 kB
          Li Lu

        Activity

          People

            gtcarrera9 Li Lu
            karams Karam Singh
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: