Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6054

TimelineServer fails to start when some LevelDb state files are missing.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha2
    • Fix Version/s: 2.9.0, 3.0.0-alpha2, 2.8.2
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We encountered an issue recently where the TimelineServer failed to start because some state files went missing.

      2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED
      ; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelines
      erver/leveldb-timeline-store.ldb/127897.sst
      org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/lev
      eldb-timeline-store.ldb/127897.sst
      
      2016-11-21 20:46:43,135 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer
      org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
              at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
              at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
              at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
              at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
              at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
      Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
              at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
              at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
              at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
              at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
              ... 5 more
      2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1
      

      Ideally we shouldn't have any missing state files. However I'd posit that the TimelineServer should have graceful degradation instead of failing to start at all.

        Attachments

        1. YARN-6054.01.patch
          5 kB
          Ravi Prakash
        2. YARN-6054.02.patch
          6 kB
          Ravi Prakash
        3. YARN-6054.03.patch
          7 kB
          Ravi Prakash

          Activity

            People

            • Assignee:
              raviprak Ravi Prakash
              Reporter:
              raviprak Ravi Prakash
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: