HBase
  1. HBase
  2. HBASE-1876

DroppedSnapshotException when flushing memstore after a datanode dies

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.90.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      A dead datanode in the cluster can lead to multiple HRegionServer failures and corrupted data. The HRegionServer failures can be reproduced consistently on a 7 machines cluster with approx 2000 regions.

      Steps to reproduce

      The easiest and safest way is to reproduce it for the .META. table, however it will work with any table.

      Locate a datanode that stores the .META. files and kill -9 it.
      In order to get multiple writes to the .META. table bring up or shut down a region server this will eventually cause a flush on the memstore

      2009-09-25 09:26:17,775 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on .META.,demo__assets,asset_283132172,1252898166036,1253265069920
      2009-09-25 09:26:17,775 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for region .META.,demo__assets,asset_283132172,1252898166036,1253265069920. Current region memstore si
      ze 16.3k
      2009-09-25 09:26:17,791 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.72.79.108:50010
      2009-09-25 09:26:17,791 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-8767099282771605606_176852

      The DFSClient will retry for 3 times, but there's a high chance it will try on the same failed datanode (it takes around 10 minutes for dead datanode to be removed from cluster)

      2009-09-25 09:26:41,810 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2814)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2078)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2264)

      2009-09-25 09:26:41,810 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_5317304716016587434_176852 bad datanode[2] nodes == null
      2009-09-25 09:26:41,810 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/hbase/.META./225980069/info/5573114819456511457" - Aborting...
      2009-09-25 09:26:41,810 FATAL org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog required. Forcing server shutdown
      org.apache.hadoop.hbase.DroppedSnapshotException: region: .META.,demo__assets,asset_283132172,1252898166036,1253265069920
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:942)
      at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:835)
      at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:241)
      at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:149)
      Caused by: java.io.IOException: Bad connect ack with firstBadLink 10.72.79.108:50010
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2872)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2795)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2078)
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2264)

      After the HRegionServer shuts down itself the regions will be reassigned however you might hit this

      2009-09-26 08:04:23,646 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: .META.,demo__assets,asset_283132172,1252898166036,1253265069920
      2009-09-26 08:04:23,684 WARN org.apache.hadoop.hbase.regionserver.Store: Skipping hdfs://b0:9000/hbase/.META./225980069/historian/1432202951743803786 because its empty. HBASE-646 DATA LOSS?
      ...
      2009-09-26 08:04:23,776 INFO org.apache.hadoop.hbase.regionserver.HRegion: region .META.,demo__assets,asset_283132172,1252898166036,1253265069920/225980069 available; sequence id is 1331458484

      We ended up with corrupted data in .META. "info:server" after master got confirmation that it was updated from the HRegionServer that got DroppedSnapshotException

      Since after a cluster restart server:info will be correct, .META. is safer to test with. Also to detect data corruption you can just scan .META. get the start key for each region and attempt to retrieve it from the corresponding table. If .META. is corrupted you get a NotServingRegionException.

      This issue is related to https://issues.apache.org/jira/browse/HDFS-630

      I attached a patch for HDFS-630 https://issues.apache.org/jira/secure/attachment/12420919/HDFS-630.patch that fixes this problem.

        Issue Links

          Activity

          Hide
          stack added a comment -

          We should make this a recommended patch for hadoop installs running hbase. Let me add it to 'Getting Started' list.

          Show
          stack added a comment - We should make this a recommended patch for hadoop installs running hbase. Let me add it to 'Getting Started' list.
          Hide
          Andrew Purtell added a comment -

          Should we roll the HDFS-630 patch into the patched Hadoop jar included in the HBase distrib, alongside the patch for HDFS-127?

          Show
          Andrew Purtell added a comment - Should we roll the HDFS-630 patch into the patched Hadoop jar included in the HBase distrib, alongside the patch for HDFS-127 ?
          Hide
          stack added a comment -

          Its not client-side only like the hdfs-127 patch. It needs the namenode patched. I'm adding a note to our getting started recommending adding this patch to your hadoop install. I think that enough for hbase 0.20.1.

          Show
          stack added a comment - Its not client-side only like the hdfs-127 patch. It needs the namenode patched. I'm adding a note to our getting started recommending adding this patch to your hadoop install. I think that enough for hbase 0.20.1.
          Hide
          stack added a comment -

          I added to our 'Getting Started' a recommendation that users apply hdfs-630 to their hadoop cluster on branch and trunk.

          I think need for hdfs-630 is going to become more apparent as we test new sync/append, especially on small clusters.

          Moving out of 0.20.1 now....

          Show
          stack added a comment - I added to our 'Getting Started' a recommendation that users apply hdfs-630 to their hadoop cluster on branch and trunk. I think need for hdfs-630 is going to become more apparent as we test new sync/append, especially on small clusters. Moving out of 0.20.1 now....
          Hide
          Cosmin Lehene added a comment -
          Show
          Cosmin Lehene added a comment - I adapted the previous patch for HDFS-630 to 0.21.x branch https://issues.apache.org/jira/secure/attachment/12422242/0001-Fix-HDFS-630-for-0.21.patch
          Hide
          stack added a comment -

          I did more testing of hdfs-630. For sure it helps with the above situation. To underline how necessary we think this patch is, especially when cluster is small, I've add the patch to the hadoop-hdfs.jar bundled with hbase.

          Show
          stack added a comment - I did more testing of hdfs-630. For sure it helps with the above situation. To underline how necessary we think this patch is, especially when cluster is small, I've add the patch to the hadoop-hdfs.jar bundled with hbase.
          Hide
          Cosmin Lehene added a comment -

          stack: the patched DFSClient is not compatible with unpatched NameNode, so if we're going to include the patch in the hadoop-hdfs.jar we need to explain that it must be used with a patched NameNode as well.

          Show
          Cosmin Lehene added a comment - stack: the patched DFSClient is not compatible with unpatched NameNode, so if we're going to include the patch in the hadoop-hdfs.jar we need to explain that it must be used with a patched NameNode as well.
          Hide
          stack added a comment -

          @Cosmin: Thats bad. Thanks. Let me undo.

          Show
          stack added a comment - @Cosmin: Thats bad. Thanks. Let me undo.
          Hide
          stack added a comment -

          I undid bundling an hadoop-hdfs patched with hdfs-630 being part of hbase deploy.

          Show
          stack added a comment - I undid bundling an hadoop-hdfs patched with hdfs-630 being part of hbase deploy.
          Hide
          Cosmin Lehene added a comment -

          Linking to HDFS-630

          Show
          Cosmin Lehene added a comment - Linking to HDFS-630
          Hide
          stack added a comment -

          hdfs-630 is in 0.21 hadoop and hadoop trunk. I just suggested that it get added to 0.20-append branch. If it goes in, we can resolve this issue against hbase 0.21.

          Show
          stack added a comment - hdfs-630 is in 0.21 hadoop and hadoop trunk. I just suggested that it get added to 0.20-append branch. If it goes in, we can resolve this issue against hbase 0.21.
          Hide
          Andrew Purtell added a comment -

          Stack:

          hdfs-630 is in 0.21 hadoop and hadoop trunk. I just suggested that it get added to 0.20-append branch. If it goes in, we can resolve this issue against hbase 0.21.

          +1

          We have been using a Hadoop patched with 630 internally so HBase is stable on small-ish DFS clusters.

          Show
          Andrew Purtell added a comment - Stack: hdfs-630 is in 0.21 hadoop and hadoop trunk. I just suggested that it get added to 0.20-append branch. If it goes in, we can resolve this issue against hbase 0.21. +1 We have been using a Hadoop patched with 630 internally so HBase is stable on small-ish DFS clusters.
          Hide
          stack added a comment -

          Dhruba committed a hdfs-630 but then I saw that Cosmin commented suggesting that Dhruba use a patch Todd made for 0.20 branch. I wonder why? Let me ask him.

          Show
          stack added a comment - Dhruba committed a hdfs-630 but then I saw that Cosmin commented suggesting that Dhruba use a patch Todd made for 0.20 branch. I wonder why? Let me ask him.
          Hide
          stack added a comment -

          oh, nm... it hasn't been committed yet. I misread the issue.

          Show
          stack added a comment - oh, nm... it hasn't been committed yet. I misread the issue.
          Hide
          stack added a comment -

          @Cosmin Can we close this? branch-0.20-append, what we have checked into hbase and what we expect to run on now has hdfs-630.

          Show
          stack added a comment - @Cosmin Can we close this? branch-0.20-append, what we have checked into hbase and what we expect to run on now has hdfs-630.
          Hide
          Cosmin Lehene added a comment -

          We're currently running CDH3b2 and it looks good. I think it's safe to close it now with HDFS-630 committed to 0.20-append.

          Show
          Cosmin Lehene added a comment - We're currently running CDH3b2 and it looks good. I think it's safe to close it now with HDFS-630 committed to 0.20-append.
          Hide
          stack added a comment -

          Closing w/ Cosmin's blessing.

          Show
          stack added a comment - Closing w/ Cosmin's blessing.

            People

            • Assignee:
              Cosmin Lehene
              Reporter:
              Cosmin Lehene
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development