HBase
  1. HBase
  2. HBASE-4078

Silent Data Offlining During HDFS Flakiness

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.89.20100924, 0.90.3, 0.92.0
    • Fix Version/s: 0.92.0, 0.94.0
    • Component/s: io, regionserver
    • Labels:
      None

      Description

      See HBASE-1436 . The bug fix for this JIRA is a temporary workaround for improperly moving partially-written files from TMP into the region directory when a FS error occurs. Unfortunately, the fix is to ignore all IO exceptions, which masks off-lining due to FS flakiness. We need to permanently fix the problem that created HBASE-1436 & then at least have the option to not open a region during times of flakey FS.

        Issue Links

          Activity

          Hide
          Nicolas Spiegelberg added a comment -

          Pritam is working on a fix for this.

          Show
          Nicolas Spiegelberg added a comment - Pritam is working on a fix for this.
          Hide
          stack added a comment -

          This looks like its similar to HBASE-3834

          Show
          stack added a comment - This looks like its similar to HBASE-3834
          Hide
          Pritam Damania added a comment -

          This is a patch for HBASE-4078

          Show
          Pritam Damania added a comment - This is a patch for HBASE-4078
          Hide
          Pritam Damania added a comment -

          Here is the review board link for this patch : https://reviews.apache.org/r/1327/diff/

          Show
          Pritam Damania added a comment - Here is the review board link for this patch : https://reviews.apache.org/r/1327/diff/
          Hide
          stack added a comment -

          I added some comments over on reviewboard but then realized that the patch looks like hbase-4078. Is it same patch? Thanks.

          Show
          stack added a comment - I added some comments over on reviewboard but then realized that the patch looks like hbase-4078. Is it same patch? Thanks.
          Hide
          stack added a comment -

          Pardon my sillyness above where I am saying that the patch for this issue is the same as the patch for this issue.

          Show
          stack added a comment - Pardon my sillyness above where I am saying that the patch for this issue is the same as the patch for this issue.
          Hide
          Pritam Damania added a comment -

          Updated Patch for HBASE-4078

          Show
          Pritam Damania added a comment - Updated Patch for HBASE-4078
          Hide
          Lars Hofhansl added a comment -

          When does the corruption actually happen?

          Does any of StoreFile.Writer.

          {append|appendMetadata|close}

          (...) silently fail, leaving a corrupt file? If any of these throws any exception we would skip moving the file anyway.
          If so, wouldn't it be better to fix that?

          Or is this a problem deeper in HDFS?

          Show
          Lars Hofhansl added a comment - When does the corruption actually happen? Does any of StoreFile.Writer. {append|appendMetadata|close} (...) silently fail, leaving a corrupt file? If any of these throws any exception we would skip moving the file anyway. If so, wouldn't it be better to fix that? Or is this a problem deeper in HDFS?
          Hide
          Lars Hofhansl added a comment -

          Ah never mind me... HDFS flakiness is what this is all about.

          Show
          Lars Hofhansl added a comment - Ah never mind me... HDFS flakiness is what this is all about.
          Hide
          Nicolas Spiegelberg added a comment -

          added to 89, 92, & 94

          Show
          Nicolas Spiegelberg added a comment - added to 89, 92, & 94
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #62 (See https://builds.apache.org/job/HBase-0.92/62/)
          HBASE-4078 Validate store files after flush/compaction

          nspiegelberg :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #62 (See https://builds.apache.org/job/HBase-0.92/62/ ) HBASE-4078 Validate store files after flush/compaction nspiegelberg : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Hide
          Jonathan Gray added a comment -

          This seems to have somehow broken cache-on-write again. I think because the verify does a closeReader() which could trigger the evict-on-close.

          I'm going to need to extend the close API to take evictOnClose as an argument. I think there's actually a JIRA for this already.

          Show
          Jonathan Gray added a comment - This seems to have somehow broken cache-on-write again. I think because the verify does a closeReader() which could trigger the evict-on-close. I'm going to need to extend the close API to take evictOnClose as an argument. I think there's actually a JIRA for this already.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2325 (See https://builds.apache.org/job/HBase-TRUNK/2325/)
          HBASE-4078 Validate store files after flush/compaction

          nspiegelberg :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2325 (See https://builds.apache.org/job/HBase-TRUNK/2325/ ) HBASE-4078 Validate store files after flush/compaction nspiegelberg : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

            People

            • Assignee:
              Pritam Damania
              Reporter:
              Nicolas Spiegelberg
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development