HBase
  1. HBase
  2. HBASE-4944

Optionally verify bulk loaded HFiles

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.90.5, 0.92.0, 0.94.0
    • Fix Version/s: 0.90.5
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.

      Patch is against trunk but can apply against all active branches.

      1. HBASE-4944-v3.patch
        5 kB
        Andrew Purtell
      2. HBASE-4944-v2.patch
        5 kB
        Andrew Purtell

        Activity

        Hide
        Andrew Purtell added a comment -

        From JIRA: "Cannot attach file HBASE-4944.patch: Unknown server error (500)."

        The patch is pretty small, so here it is:

        Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        ===================================================================
        --- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java	(revision 1210044)
        +++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java	(working copy)
        @@ -50,6 +50,7 @@
         import org.apache.hadoop.hbase.io.hfile.Compression;
         import org.apache.hadoop.hbase.io.hfile.HFile;
         import org.apache.hadoop.hbase.io.hfile.HFileScanner;
        +import org.apache.hadoop.hbase.io.hfile.InvalidHFileException;
         import org.apache.hadoop.hbase.monitoring.MonitoredTask;
         import org.apache.hadoop.hbase.regionserver.StoreScanner.ScanType;
         import org.apache.hadoop.hbase.regionserver.compactions.CompactionProgress;
        @@ -123,6 +124,7 @@
           private final String storeNameStr;
           private CompactionProgress progress;
           private final int compactionKVMax;
        +  private final boolean verifyBulkLoads;
         
           // not private for testing
           /* package */ScanInfo scanInfo;
        @@ -222,6 +224,9 @@
               = conf.getLong("hbase.hstore.compaction.max.size", Long.MAX_VALUE);
             this.compactionKVMax = conf.getInt("hbase.hstore.compaction.kv.max", 10);
         
        +    this.verifyBulkLoads = conf.getBoolean("hbase.hstore.bulkload.verify",
        +        true);
        +
             if (Store.closeCheckInterval == 0) {
               Store.closeCheckInterval = conf.getInt(
                   "hbase.hstore.close.check.interval", 10*1000*1000 /* 10 MB */);
        @@ -355,8 +360,8 @@
           }
         
           /**
        -   * This throws a WrongRegionException if the bulkHFile does not fit in this
        -   * region.
        +   * This throws a WrongRegionException if the HFile does not fit in this
        +   * region, or an InvalidHFileException if the HFile is not valid.
            *
            */
           void assertBulkLoadHFileOk(Path srcPath) throws IOException {
        @@ -386,6 +391,34 @@
                     "Bulk load file " + srcPath.toString() + " does not fit inside region "
                     + this.region);
               }
        +
        +      if (verifyBulkLoads) {
        +        KeyValue pkv = null;
        +        HFileScanner scanner = reader.getScanner(false, false, false);
        +        scanner.seekTo();
        +        do {
        +          KeyValue kv = scanner.getKeyValue();
        +          if (pkv != null) {
        +            if (Bytes.compareTo(pkv.getBuffer(), pkv.getRowOffset(),
        +                pkv.getRowLength(), kv.getBuffer(), kv.getRowOffset(),
        +                kv.getRowLength()) > 0) {
        +              throw new InvalidHFileException("Previous row is greater then"
        +                  + " current row: path=" + srcPath + " previous="
        +                  + Bytes.toStringBinary(pkv.getKey()) + " current="
        +                  + Bytes.toStringBinary(kv.getKey()));
        +            }
        +            if (Bytes.compareTo(pkv.getBuffer(), pkv.getFamilyOffset(),
        +                pkv.getFamilyLength(), kv.getBuffer(), kv.getFamilyOffset(),
        +                kv.getFamilyLength()) != 0) {
        +              throw new InvalidHFileException("Previous key had different"
        +                  + " family compared to current key: path=" + srcPath
        +                  + " previous=" + Bytes.toStringBinary(pkv.getKey())
        +                  + " current=" + Bytes.toStringBinary(kv.getKey()));
        +            }
        +          }
        +          pkv = kv;
        +        } while (scanner.next());
        +      }
             } finally {
               if (reader != null) reader.close();
             }
        Index: src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        ===================================================================
        --- src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java	(revision 0)
        +++ src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java	(revision 0)
        @@ -0,0 +1,40 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one
        + * or more contributor license agreements.  See the NOTICE file
        + * distributed with this work for additional information
        + * regarding copyright ownership.  The ASF licenses this file
        + * to you under the Apache License, Version 2.0 (the
        + * "License"); you may not use this file except in compliance
        + * with the License.  You may obtain a copy of the License at
        + *
        + *     http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +package org.apache.hadoop.hbase.io.hfile;
        +
        +import java.io.IOException;
        +
        +/**
        + * Thrown when an invalid HFile format is detected
        + */
        +public class InvalidHFileException extends IOException {
        +  private static final long serialVersionUID = 4660352028739861249L;
        +
        +  /** constructor */
        +  public InvalidHFileException() {
        +    super();
        +  }
        +
        +  /**
        +   * Constructor
        +   * @param s message
        +   */
        +  public InvalidHFileException(String s) {
        +    super(s);
        +  }
        +}
        \ No newline at end of file
        
        Show
        Andrew Purtell added a comment - From JIRA: "Cannot attach file HBASE-4944 .patch: Unknown server error (500)." The patch is pretty small, so here it is: Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java =================================================================== --- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java (revision 1210044) +++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java (working copy) @@ -50,6 +50,7 @@ import org.apache.hadoop.hbase.io.hfile.Compression; import org.apache.hadoop.hbase.io.hfile.HFile; import org.apache.hadoop.hbase.io.hfile.HFileScanner; + import org.apache.hadoop.hbase.io.hfile.InvalidHFileException; import org.apache.hadoop.hbase.monitoring.MonitoredTask; import org.apache.hadoop.hbase.regionserver.StoreScanner.ScanType; import org.apache.hadoop.hbase.regionserver.compactions.CompactionProgress; @@ -123,6 +124,7 @@ private final String storeNameStr; private CompactionProgress progress; private final int compactionKVMax; + private final boolean verifyBulkLoads; // not private for testing /* package */ScanInfo scanInfo; @@ -222,6 +224,9 @@ = conf.getLong( "hbase.hstore.compaction.max.size" , Long .MAX_VALUE); this .compactionKVMax = conf.getInt( "hbase.hstore.compaction.kv.max" , 10); + this .verifyBulkLoads = conf.getBoolean( "hbase.hstore.bulkload.verify" , + true ); + if (Store.closeCheckInterval == 0) { Store.closeCheckInterval = conf.getInt( "hbase.hstore.close.check.interval" , 10*1000*1000 /* 10 MB */); @@ -355,8 +360,8 @@ } /** - * This throws a WrongRegionException if the bulkHFile does not fit in this - * region. + * This throws a WrongRegionException if the HFile does not fit in this + * region, or an InvalidHFileException if the HFile is not valid. * */ void assertBulkLoadHFileOk(Path srcPath) throws IOException { @@ -386,6 +391,34 @@ "Bulk load file " + srcPath.toString() + " does not fit inside region " + this .region); } + + if (verifyBulkLoads) { + KeyValue pkv = null ; + HFileScanner scanner = reader.getScanner( false , false , false ); + scanner.seekTo(); + do { + KeyValue kv = scanner.getKeyValue(); + if (pkv != null ) { + if (Bytes.compareTo(pkv.getBuffer(), pkv.getRowOffset(), + pkv.getRowLength(), kv.getBuffer(), kv.getRowOffset(), + kv.getRowLength()) > 0) { + throw new InvalidHFileException( "Previous row is greater then" + + " current row: path=" + srcPath + " previous=" + + Bytes.toStringBinary(pkv.getKey()) + " current=" + + Bytes.toStringBinary(kv.getKey())); + } + if (Bytes.compareTo(pkv.getBuffer(), pkv.getFamilyOffset(), + pkv.getFamilyLength(), kv.getBuffer(), kv.getFamilyOffset(), + kv.getFamilyLength()) != 0) { + throw new InvalidHFileException( "Previous key had different" + + " family compared to current key: path=" + srcPath + + " previous=" + Bytes.toStringBinary(pkv.getKey()) + + " current=" + Bytes.toStringBinary(kv.getKey())); + } + } + pkv = kv; + } while (scanner.next()); + } } finally { if (reader != null ) reader.close(); } Index: src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java =================================================================== --- src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java (revision 0) +++ src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java (revision 0) @@ -0,0 +1,40 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License" ); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http: //www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + package org.apache.hadoop.hbase.io.hfile; + + import java.io.IOException; + +/** + * Thrown when an invalid HFile format is detected + */ + public class InvalidHFileException extends IOException { + private static final long serialVersionUID = 4660352028739861249L; + + /** constructor */ + public InvalidHFileException() { + super (); + } + + /** + * Constructor + * @param s message + */ + public InvalidHFileException( String s) { + super (s); + } +} \ No newline at end of file
        Hide
        Ted Yu added a comment -

        Patch from Andy.

        Show
        Ted Yu added a comment - Patch from Andy.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506016/4944.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/436//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506016/4944.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/436//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Minor comments:

        +        KeyValue pkv = null;
        

        The variable can be named prevKV which is clearer.

        +              throw new InvalidHFileException("Previous row is greater then"
        

        Typo above, should be 'greater than'.

        Show
        Ted Yu added a comment - Minor comments: + KeyValue pkv = null ; The variable can be named prevKV which is clearer. + throw new InvalidHFileException( "Previous row is greater then" Typo above, should be 'greater than'.
        Hide
        Ted Yu added a comment -

        Looks like the patch should be rebased:

        4 out of 5 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/Store.java.rej
        
        Show
        Ted Yu added a comment - Looks like the patch should be rebased: 4 out of 5 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/Store.java.rej
        Hide
        Andrew Purtell added a comment -

        Rebased patch addressing Ted's comments.

        Show
        Andrew Purtell added a comment - Rebased patch addressing Ted's comments.
        Hide
        Andrew Purtell added a comment -

        Sorry, v3 patch restores the default to false (current behavior)

        Show
        Andrew Purtell added a comment - Sorry, v3 patch restores the default to false (current behavior)
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506018/HBASE-4944-v3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -160 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 71 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapreduce.TestTimeRangeMapRed
        org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
        org.apache.hadoop.hbase.TestDrainingServer
        org.apache.hadoop.hbase.TestFullLogReconstruction
        org.apache.hadoop.hbase.mapreduce.TestImportTsv
        org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/437//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/437//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/437//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506018/HBASE-4944-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -160 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 71 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestTimeRangeMapRed org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.TestFullLogReconstruction org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapreduce.TestTableMapReduce Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/437//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/437//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/437//console This message is automatically generated.
        Hide
        Andrew Purtell added a comment -

        Test failures are unrelated to the patch. All tests pass locally for me.

        @Ted, what do you think of v3?

        Show
        Andrew Purtell added a comment - Test failures are unrelated to the patch. All tests pass locally for me. @Ted, what do you think of v3?
        Hide
        Ted Yu added a comment -

        Patch v3 looks good.

        Minor comment for the case of different families:

        +                  + " previous=" + Bytes.toStringBinary(prevKV.getKey())
        +                  + " current=" + Bytes.toStringBinary(kv.getKey()));
        

        I think it would be nice to include family names by calling getFamily() in the above message.
        This can be done at time of commit.

        Show
        Ted Yu added a comment - Patch v3 looks good. Minor comment for the case of different families: + + " previous=" + Bytes.toStringBinary(prevKV.getKey()) + + " current=" + Bytes.toStringBinary(kv.getKey())); I think it would be nice to include family names by calling getFamily() in the above message. This can be done at time of commit.
        Hide
        Andrew Purtell added a comment -

        @Ted Thanks for taking a look. Sure, I will make that change on commit.

        Show
        Andrew Purtell added a comment - @Ted Thanks for taking a look. Sure, I will make that change on commit.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #170 (See https://builds.apache.org/job/HBase-0.92/170/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #170 (See https://builds.apache.org/job/HBase-0.92/170/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2516 (See https://builds.apache.org/job/HBase-TRUNK/2516/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2516 (See https://builds.apache.org/job/HBase-TRUNK/2516/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #22 (See https://builds.apache.org/job/HBase-TRUNK-security/22/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #22 (See https://builds.apache.org/job/HBase-TRUNK-security/22/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #30 (See https://builds.apache.org/job/HBase-0.92-security/30/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #30 (See https://builds.apache.org/job/HBase-0.92-security/30/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java

          People

          • Assignee:
            Andrew Purtell
            Reporter:
            Andrew Purtell
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development