Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19681

Online snapshot creation failing with missing store file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.3.0
    • None
    • None
    • Hadoop - 2.7.3
      HBase 1.3.0
      OS - GNU/Linux x86_64
      Cluster - Amazon Elastic Mapreduce

    Description

      We are facing problem creating online snapshot of our HBase table. The table contains 20TB data and receiving ~10000 writes per second. The snapshot creating failing intermittently with error that some hfile missing, see the detailed output below. Once we locate the region server hosting the region and restart the region server, snapshot creation succeeds. It seems the missing hfile removed due to minor compaction, but region server still holds the pointer to the file.

      [hadoop@ip-10-0-12-164 ~]$ hbase shell
      HBase Shell; enter 'help<RETURN>' for list of supported commands.
      Type "exit<RETURN>" to leave the HBase Shell
      Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017
       
      hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’
       
      ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot

      { ss=x_snapshot table=x_table type=FLUSH }

      had an error.  Procedure x_snapshot

      { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, ip-10-0-0-32.ec2.internal,16020,1508372591059, ip-10-0-14-221.ec2.internal,16020,1508372580873, ip-10-0-15-185.ec2.internal,16020,1508372588507, ip-10-0-9-43.ec2.internal,16020,1508372569107, ip-10-0-10-62.ec2.internal,16020,1512885921693, ip-10-0-8-216.ec2.internal,16020,1508372584133, ip-10-0-1-207.ec2.internal,16020,1508372580144, ip-10-0-0-173.ec2.internal,16020,1508372584969, ip-10-0-4-79.ec2.internal,16020,1508372587161, ip-10-0-3-165.ec2.internal,16020,1508372593566, ip-10-0-14-137.ec2.internal,16020,1508372583225, ip-10-0-6-33.ec2.internal,16020,1508372581587, ip-10-0-15-199.ec2.internal,16020,1508372587478, ip-10-0-5-253.ec2.internal,16020,1508372581243, ip-10-0-1-99.ec2.internal,16020,1508372609684] }

              at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354)
              at org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058)
              at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089)
              at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328)
              at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
              at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
              at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
      Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
              at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
              at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315)
              at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344)
              ... 6 more
      Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: java.io.FileNotFoundException: File does not exist: hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
              at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347)
              at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140)
              at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160)
              at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:187)
              at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
       
      Here is some help for this command:
      Take a snapshot of specified table. Examples:
       
        hbase> snapshot 'sourceTable', 'snapshotName'
        hbase> snapshot 'namespace:sourceTable', 'snapshotName',

      {SKIP_FLUSH => true}

      Attachments

        1. region-server-missing file-log.doc
          35 kB
          Anirban Roy
        2. region-server-snapshot-exception-log.doc
          39 kB
          Anirban Roy

        Activity

          People

            Unassigned Unassigned
            r_anirban Anirban Roy
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: