HBase
  1. HBase
  2. HBASE-4944

Optionally verify bulk loaded HFiles

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.90.5, 0.92.0, 0.94.0
    • Fix Version/s: 0.90.5
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness.

      Patch is against trunk but can apply against all active branches.

      1. HBASE-4944-v2.patch
        5 kB
        Andrew Purtell
      2. HBASE-4944-v3.patch
        5 kB
        Andrew Purtell

        Activity

        Andrew Purtell created issue -
        Hide
        Andrew Purtell added a comment -

        From JIRA: "Cannot attach file HBASE-4944.patch: Unknown server error (500)."

        The patch is pretty small, so here it is:

        Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        ===================================================================
        --- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java	(revision 1210044)
        +++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java	(working copy)
        @@ -50,6 +50,7 @@
         import org.apache.hadoop.hbase.io.hfile.Compression;
         import org.apache.hadoop.hbase.io.hfile.HFile;
         import org.apache.hadoop.hbase.io.hfile.HFileScanner;
        +import org.apache.hadoop.hbase.io.hfile.InvalidHFileException;
         import org.apache.hadoop.hbase.monitoring.MonitoredTask;
         import org.apache.hadoop.hbase.regionserver.StoreScanner.ScanType;
         import org.apache.hadoop.hbase.regionserver.compactions.CompactionProgress;
        @@ -123,6 +124,7 @@
           private final String storeNameStr;
           private CompactionProgress progress;
           private final int compactionKVMax;
        +  private final boolean verifyBulkLoads;
         
           // not private for testing
           /* package */ScanInfo scanInfo;
        @@ -222,6 +224,9 @@
               = conf.getLong("hbase.hstore.compaction.max.size", Long.MAX_VALUE);
             this.compactionKVMax = conf.getInt("hbase.hstore.compaction.kv.max", 10);
         
        +    this.verifyBulkLoads = conf.getBoolean("hbase.hstore.bulkload.verify",
        +        true);
        +
             if (Store.closeCheckInterval == 0) {
               Store.closeCheckInterval = conf.getInt(
                   "hbase.hstore.close.check.interval", 10*1000*1000 /* 10 MB */);
        @@ -355,8 +360,8 @@
           }
         
           /**
        -   * This throws a WrongRegionException if the bulkHFile does not fit in this
        -   * region.
        +   * This throws a WrongRegionException if the HFile does not fit in this
        +   * region, or an InvalidHFileException if the HFile is not valid.
            *
            */
           void assertBulkLoadHFileOk(Path srcPath) throws IOException {
        @@ -386,6 +391,34 @@
                     "Bulk load file " + srcPath.toString() + " does not fit inside region "
                     + this.region);
               }
        +
        +      if (verifyBulkLoads) {
        +        KeyValue pkv = null;
        +        HFileScanner scanner = reader.getScanner(false, false, false);
        +        scanner.seekTo();
        +        do {
        +          KeyValue kv = scanner.getKeyValue();
        +          if (pkv != null) {
        +            if (Bytes.compareTo(pkv.getBuffer(), pkv.getRowOffset(),
        +                pkv.getRowLength(), kv.getBuffer(), kv.getRowOffset(),
        +                kv.getRowLength()) > 0) {
        +              throw new InvalidHFileException("Previous row is greater then"
        +                  + " current row: path=" + srcPath + " previous="
        +                  + Bytes.toStringBinary(pkv.getKey()) + " current="
        +                  + Bytes.toStringBinary(kv.getKey()));
        +            }
        +            if (Bytes.compareTo(pkv.getBuffer(), pkv.getFamilyOffset(),
        +                pkv.getFamilyLength(), kv.getBuffer(), kv.getFamilyOffset(),
        +                kv.getFamilyLength()) != 0) {
        +              throw new InvalidHFileException("Previous key had different"
        +                  + " family compared to current key: path=" + srcPath
        +                  + " previous=" + Bytes.toStringBinary(pkv.getKey())
        +                  + " current=" + Bytes.toStringBinary(kv.getKey()));
        +            }
        +          }
        +          pkv = kv;
        +        } while (scanner.next());
        +      }
             } finally {
               if (reader != null) reader.close();
             }
        Index: src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        ===================================================================
        --- src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java	(revision 0)
        +++ src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java	(revision 0)
        @@ -0,0 +1,40 @@
        +/**
        + * Licensed to the Apache Software Foundation (ASF) under one
        + * or more contributor license agreements.  See the NOTICE file
        + * distributed with this work for additional information
        + * regarding copyright ownership.  The ASF licenses this file
        + * to you under the Apache License, Version 2.0 (the
        + * "License"); you may not use this file except in compliance
        + * with the License.  You may obtain a copy of the License at
        + *
        + *     http://www.apache.org/licenses/LICENSE-2.0
        + *
        + * Unless required by applicable law or agreed to in writing, software
        + * distributed under the License is distributed on an "AS IS" BASIS,
        + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        + * See the License for the specific language governing permissions and
        + * limitations under the License.
        + */
        +package org.apache.hadoop.hbase.io.hfile;
        +
        +import java.io.IOException;
        +
        +/**
        + * Thrown when an invalid HFile format is detected
        + */
        +public class InvalidHFileException extends IOException {
        +  private static final long serialVersionUID = 4660352028739861249L;
        +
        +  /** constructor */
        +  public InvalidHFileException() {
        +    super();
        +  }
        +
        +  /**
        +   * Constructor
        +   * @param s message
        +   */
        +  public InvalidHFileException(String s) {
        +    super(s);
        +  }
        +}
        \ No newline at end of file
        
        Show
        Andrew Purtell added a comment - From JIRA: "Cannot attach file HBASE-4944 .patch: Unknown server error (500)." The patch is pretty small, so here it is: Index: src/main/java/org/apache/hadoop/hbase/regionserver/Store.java =================================================================== --- src/main/java/org/apache/hadoop/hbase/regionserver/Store.java (revision 1210044) +++ src/main/java/org/apache/hadoop/hbase/regionserver/Store.java (working copy) @@ -50,6 +50,7 @@ import org.apache.hadoop.hbase.io.hfile.Compression; import org.apache.hadoop.hbase.io.hfile.HFile; import org.apache.hadoop.hbase.io.hfile.HFileScanner; + import org.apache.hadoop.hbase.io.hfile.InvalidHFileException; import org.apache.hadoop.hbase.monitoring.MonitoredTask; import org.apache.hadoop.hbase.regionserver.StoreScanner.ScanType; import org.apache.hadoop.hbase.regionserver.compactions.CompactionProgress; @@ -123,6 +124,7 @@ private final String storeNameStr; private CompactionProgress progress; private final int compactionKVMax; + private final boolean verifyBulkLoads; // not private for testing /* package */ScanInfo scanInfo; @@ -222,6 +224,9 @@ = conf.getLong( "hbase.hstore.compaction.max.size" , Long .MAX_VALUE); this .compactionKVMax = conf.getInt( "hbase.hstore.compaction.kv.max" , 10); + this .verifyBulkLoads = conf.getBoolean( "hbase.hstore.bulkload.verify" , + true ); + if (Store.closeCheckInterval == 0) { Store.closeCheckInterval = conf.getInt( "hbase.hstore.close.check.interval" , 10*1000*1000 /* 10 MB */); @@ -355,8 +360,8 @@ } /** - * This throws a WrongRegionException if the bulkHFile does not fit in this - * region. + * This throws a WrongRegionException if the HFile does not fit in this + * region, or an InvalidHFileException if the HFile is not valid. * */ void assertBulkLoadHFileOk(Path srcPath) throws IOException { @@ -386,6 +391,34 @@ "Bulk load file " + srcPath.toString() + " does not fit inside region " + this .region); } + + if (verifyBulkLoads) { + KeyValue pkv = null ; + HFileScanner scanner = reader.getScanner( false , false , false ); + scanner.seekTo(); + do { + KeyValue kv = scanner.getKeyValue(); + if (pkv != null ) { + if (Bytes.compareTo(pkv.getBuffer(), pkv.getRowOffset(), + pkv.getRowLength(), kv.getBuffer(), kv.getRowOffset(), + kv.getRowLength()) > 0) { + throw new InvalidHFileException( "Previous row is greater then" + + " current row: path=" + srcPath + " previous=" + + Bytes.toStringBinary(pkv.getKey()) + " current=" + + Bytes.toStringBinary(kv.getKey())); + } + if (Bytes.compareTo(pkv.getBuffer(), pkv.getFamilyOffset(), + pkv.getFamilyLength(), kv.getBuffer(), kv.getFamilyOffset(), + kv.getFamilyLength()) != 0) { + throw new InvalidHFileException( "Previous key had different" + + " family compared to current key: path=" + srcPath + + " previous=" + Bytes.toStringBinary(pkv.getKey()) + + " current=" + Bytes.toStringBinary(kv.getKey())); + } + } + pkv = kv; + } while (scanner.next()); + } } finally { if (reader != null ) reader.close(); } Index: src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java =================================================================== --- src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java (revision 0) +++ src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java (revision 0) @@ -0,0 +1,40 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License" ); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http: //www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + package org.apache.hadoop.hbase.io.hfile; + + import java.io.IOException; + +/** + * Thrown when an invalid HFile format is detected + */ + public class InvalidHFileException extends IOException { + private static final long serialVersionUID = 4660352028739861249L; + + /** constructor */ + public InvalidHFileException() { + super (); + } + + /** + * Constructor + * @param s message + */ + public InvalidHFileException( String s) { + super (s); + } +} \ No newline at end of file
        Andrew Purtell made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.92.0 [ 12314223 ]
        Affects Version/s 0.94.0 [ 12316419 ]
        Affects Version/s 0.90.5 [ 12317145 ]
        Fix Version/s 0.92.0 [ 12314223 ]
        Fix Version/s 0.94.0 [ 12316419 ]
        Fix Version/s 0.90.5 [ 12317145 ]
        Hide
        Ted Yu added a comment -

        Patch from Andy.

        Show
        Ted Yu added a comment - Patch from Andy.
        Ted Yu made changes -
        Attachment 4944.txt [ 12506016 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506016/4944.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/436//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506016/4944.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/436//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Minor comments:

        +        KeyValue pkv = null;
        

        The variable can be named prevKV which is clearer.

        +              throw new InvalidHFileException("Previous row is greater then"
        

        Typo above, should be 'greater than'.

        Show
        Ted Yu added a comment - Minor comments: + KeyValue pkv = null ; The variable can be named prevKV which is clearer. + throw new InvalidHFileException( "Previous row is greater then" Typo above, should be 'greater than'.
        Hide
        Ted Yu added a comment -

        Looks like the patch should be rebased:

        4 out of 5 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/Store.java.rej
        
        Show
        Ted Yu added a comment - Looks like the patch should be rebased: 4 out of 5 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/Store.java.rej
        Hide
        Andrew Purtell added a comment -

        Rebased patch addressing Ted's comments.

        Show
        Andrew Purtell added a comment - Rebased patch addressing Ted's comments.
        Andrew Purtell made changes -
        Attachment HBASE-4944-v2.patch [ 12506017 ]
        Andrew Purtell made changes -
        Attachment 4944.txt [ 12506016 ]
        Hide
        Andrew Purtell added a comment -

        Sorry, v3 patch restores the default to false (current behavior)

        Show
        Andrew Purtell added a comment - Sorry, v3 patch restores the default to false (current behavior)
        Andrew Purtell made changes -
        Attachment HBASE-4944-v3.patch [ 12506018 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506018/HBASE-4944-v3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -160 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 71 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapreduce.TestTimeRangeMapRed
        org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
        org.apache.hadoop.hbase.TestDrainingServer
        org.apache.hadoop.hbase.TestFullLogReconstruction
        org.apache.hadoop.hbase.mapreduce.TestImportTsv
        org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/437//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/437//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/437//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506018/HBASE-4944-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -160 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 71 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestTimeRangeMapRed org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.TestFullLogReconstruction org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapreduce.TestTableMapReduce Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/437//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/437//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/437//console This message is automatically generated.
        Hide
        Andrew Purtell added a comment -

        Test failures are unrelated to the patch. All tests pass locally for me.

        @Ted, what do you think of v3?

        Show
        Andrew Purtell added a comment - Test failures are unrelated to the patch. All tests pass locally for me. @Ted, what do you think of v3?
        Andrew Purtell made changes -
        Assignee Andrew Purtell [ apurtell ]
        Hide
        Ted Yu added a comment -

        Patch v3 looks good.

        Minor comment for the case of different families:

        +                  + " previous=" + Bytes.toStringBinary(prevKV.getKey())
        +                  + " current=" + Bytes.toStringBinary(kv.getKey()));
        

        I think it would be nice to include family names by calling getFamily() in the above message.
        This can be done at time of commit.

        Show
        Ted Yu added a comment - Patch v3 looks good. Minor comment for the case of different families: + + " previous=" + Bytes.toStringBinary(prevKV.getKey()) + + " current=" + Bytes.toStringBinary(kv.getKey())); I think it would be nice to include family names by calling getFamily() in the above message. This can be done at time of commit.
        Hide
        Andrew Purtell added a comment -

        @Ted Thanks for taking a look. Sure, I will make that change on commit.

        Show
        Andrew Purtell added a comment - @Ted Thanks for taking a look. Sure, I will make that change on commit.
        Andrew Purtell made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.92.0 [ 12314223 ]
        Fix Version/s 0.94.0 [ 12316419 ]
        Fix Version/s 0.90.5 [ 12317145 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #170 (See https://builds.apache.org/job/HBase-0.92/170/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #170 (See https://builds.apache.org/job/HBase-0.92/170/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2516 (See https://builds.apache.org/job/HBase-TRUNK/2516/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2516 (See https://builds.apache.org/job/HBase-TRUNK/2516/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #22 (See https://builds.apache.org/job/HBase-TRUNK-security/22/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #22 (See https://builds.apache.org/job/HBase-TRUNK-security/22/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #30 (See https://builds.apache.org/job/HBase-0.92-security/30/)
        HBASE-4944. Optionally verify bulk loaded HFiles

        apurtell :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #30 (See https://builds.apache.org/job/HBase-0.92-security/30/ ) HBASE-4944 . Optionally verify bulk loaded HFiles apurtell : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/InvalidHFileException.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        stack made changes -
        Fix Version/s 0.90.6 [ 12319200 ]
        Fix Version/s 0.92.0 [ 12314223 ]
        Fix Version/s 0.94.0 [ 12316419 ]
        Fix Version/s 0.90.5 [ 12317145 ]
        stack made changes -
        Fix Version/s 0.90.5 [ 12317145 ]
        Fix Version/s 0.90.6 [ 12319200 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        3m 31s 1 Andrew Purtell 04/Dec/11 01:52
        Patch Available Patch Available Resolved Resolved
        1d 41m 1 Andrew Purtell 05/Dec/11 02:33

          People

          • Assignee:
            Andrew Purtell
            Reporter:
            Andrew Purtell
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development