HBase
  1. HBase
  2. HBASE-8672

Create an Integration test for Bulk Loads

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.98.0, 0.95.1
    • Fix Version/s: 0.98.0, 0.95.2
    • Component/s: regionserver, test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Bulk loads and MR are not well tested using our IT tests. We should add a test that bulk loads hfiles and then scans over the resulting table to make sure that all the data is there.

      1. HBASE-8672-0.patch
        20 kB
        Elliott Clark
      2. HBASE-8672-1.patch
        22 kB
        Elliott Clark
      3. HBASE-8672-2.patch
        23 kB
        Elliott Clark
      4. HBASE-8672-3.patch
        24 kB
        Elliott Clark
      5. HBASE-8672-4.patch
        24 kB
        Elliott Clark
      6. HBASE-8672-5.patch
        25 kB
        Elliott Clark

        Activity

        Hide
        Hudson added a comment -

        Integrated in hbase-0.95 #235 (See https://builds.apache.org/job/hbase-0.95/235/)
        HBASE-8672 Create an Integration test for Bulk Loads (Revision 1491657)

        Result = FAILURE
        eclark :
        Files :

        • /hbase/branches/0.95/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
        Show
        Hudson added a comment - Integrated in hbase-0.95 #235 (See https://builds.apache.org/job/hbase-0.95/235/ ) HBASE-8672 Create an Integration test for Bulk Loads (Revision 1491657) Result = FAILURE eclark : Files : /hbase/branches/0.95/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #4171 (See https://builds.apache.org/job/HBase-TRUNK/4171/)
        HBASE-8672 Create an Integration test for Bulk Loads (Revision 1491656)

        Result = SUCCESS
        eclark :
        Files :

        • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #4171 (See https://builds.apache.org/job/HBase-TRUNK/4171/ ) HBASE-8672 Create an Integration test for Bulk Loads (Revision 1491656) Result = SUCCESS eclark : Files : /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
        Hide
        Elliott Clark added a comment -

        Committed revision 1491657.

        Show
        Elliott Clark added a comment - Committed revision 1491657.
        Hide
        Elliott Clark added a comment -

        Here's what I'm planning to commit soon. It has an expanded Javadoc like Enis was asking for.

        Show
        Elliott Clark added a comment - Here's what I'm planning to commit soon. It has an expanded Javadoc like Enis was asking for.
        Hide
        Enis Soztutar added a comment -

        +1. Skimmed the patch looks great. Can you add your first comment to the class javadoc so that users do not have to read this jira or whole code.

        Show
        Enis Soztutar added a comment - +1. Skimmed the patch looks great. Can you add your first comment to the class javadoc so that users do not have to read this jira or whole code.
        Hide
        Elliott Clark added a comment -

        Does TableOutputFormat suffer the same pain?

        TableOutputFormat adds a serializer for mutations.

        Show
        Elliott Clark added a comment - Does TableOutputFormat suffer the same pain? TableOutputFormat adds a serializer for mutations.
        Hide
        Elliott Clark added a comment -

        Added a timestamp to the path name so left over files can be correlated to error's in tests.

        Show
        Elliott Clark added a comment - Added a timestamp to the path name so left over files can be correlated to error's in tests.
        Hide
        Nick Dimiduk added a comment -

        Not sure why the io serializations weren't there before. Outputing KeyValues null pointers for me without it. Maybe everyone was just adding that during testing ?

        Very strange indeed. Does TableOutputFormat suffer the same pain? Perhaps this can be set in TableMapReduceUtils instead of on a per-output formatter basis? I wonder if this is related to the refactoring stack was doing around these interfaces?

        +1

        Show
        Nick Dimiduk added a comment - Not sure why the io serializations weren't there before. Outputing KeyValues null pointers for me without it. Maybe everyone was just adding that during testing ? Very strange indeed. Does TableOutputFormat suffer the same pain? Perhaps this can be set in TableMapReduceUtils instead of on a per-output formatter basis? I wonder if this is related to the refactoring stack was doing around these interfaces? +1
        Hide
        Aleksandr Shulman added a comment -

        Minor issue – in

        runLinkedListMRJob(int iteration)

        , your delete comes at the end of the function. Should there be an issue and the test fails, it will not reach that point and there will be leftover cruft that could affect other tests. One way to get around this is to use a try-catch-finally (or just try-finally) pattern, with the deletion being in the 'finally' portion.

        Granted, you only have one test, but if you write more, or if you run this on an actual cluster, cleanup becomes important.

        Show
        Aleksandr Shulman added a comment - Minor issue – in runLinkedListMRJob( int iteration) , your delete comes at the end of the function. Should there be an issue and the test fails, it will not reach that point and there will be leftover cruft that could affect other tests. One way to get around this is to use a try-catch-finally (or just try-finally) pattern, with the deletion being in the 'finally' portion. Granted, you only have one test, but if you write more, or if you run this on an actual cluster, cleanup becomes important.
        Hide
        Elliott Clark added a comment -
        • Changed 0l -> 0L
        • Added header
        • Moved initCredentials to HFileOutputFormat.configureIncrementalLoad

        Not sure why the io serializations weren't there before. Outputing KeyValues null pointers for me without it. Maybe everyone was just adding that during testing ?

        Show
        Elliott Clark added a comment - Changed 0l -> 0L Added header Moved initCredentials to HFileOutputFormat.configureIncrementalLoad Not sure why the io serializations weren't there before. Outputing KeyValues null pointers for me without it. Maybe everyone was just adding that during testing ?
        Hide
        Nick Dimiduk added a comment -

        Looks like a nice addition, Elliott. Once you get this merged, would you mind adding launch instructions and any relevant parameters to the release notes?

        Show
        Nick Dimiduk added a comment - Looks like a nice addition, Elliott. Once you get this merged, would you mind adding launch instructions and any relevant parameters to the release notes?
        Hide
        Nick Dimiduk added a comment -

        IntegrationTestBulkLoad.java is missing the license header.

        +          if (lc.getRk() != 0l) throw new RuntimeException("Chains should all start at 0 rk");
        

        nit: use '0L' instead of '0l' which looks a lot like '01'.

        +    conf.setStrings("io.serializations", conf.get("io.serializations"),
        +        KeyValueSerialization.class.getName());
        

        Why does this become necessary (or, rather, why wasn't it necessary before)?

        +
        +    void setFirstRow(byte[] userInput);
        +
        +    void setLastRow(byte[] userInput);
        

        nit: please javadoc interface methods.

        Show
        Nick Dimiduk added a comment - IntegrationTestBulkLoad.java is missing the license header. + if (lc.getRk() != 0l) throw new RuntimeException("Chains should all start at 0 rk"); nit: use '0L' instead of '0l' which looks a lot like '01'. + conf.setStrings("io.serializations", conf.get("io.serializations"), + KeyValueSerialization.class.getName()); Why does this become necessary (or, rather, why wasn't it necessary before)? + + void setFirstRow(byte[] userInput); + + void setLastRow(byte[] userInput); nit: please javadoc interface methods.
        Hide
        Elliott Clark added a comment -

        Version that should work with security.

        Thanks for the reviews.

        Show
        Elliott Clark added a comment - Version that should work with security. Thanks for the reviews.
        Hide
        Matteo Bertozzi added a comment -

        +1, v1 looks good to me

        Show
        Matteo Bertozzi added a comment - +1, v1 looks good to me
        Hide
        Aleksandr Shulman added a comment -

        First read of the patch looks good. Will take a second look shortly.

        Show
        Aleksandr Shulman added a comment - First read of the patch looks good. Will take a second look shortly.
        Hide
        Elliott Clark added a comment -
        • Added lots of java doc comments.
        • Added a delete table to the initial setup to allow running this repeatedly on the same cluster.
        Show
        Elliott Clark added a comment - Added lots of java doc comments. Added a delete table to the initial setup to allow running this repeatedly on the same cluster.
        Hide
        Elliott Clark added a comment -

        Here's a path that should test bulk loads.

        It starts an MR job that creates linked chains

        The format of rows is like this:

        rk -> Long
        d:<< Chain Id >> -> Random Data.
        l:<< Chain Id >> -> Row Key of the next link in the chain
        s:<< Chain Id >> -> The step in the chain that his link is.

        All chains start on row 0.

        So we create these chains and then walk over them in an MR job.

        Show
        Elliott Clark added a comment - Here's a path that should test bulk loads. It starts an MR job that creates linked chains The format of rows is like this: rk -> Long d:<< Chain Id >> -> Random Data. l:<< Chain Id >> -> Row Key of the next link in the chain s:<< Chain Id >> -> The step in the chain that his link is. All chains start on row 0. So we create these chains and then walk over them in an MR job.

          People

          • Assignee:
            Elliott Clark
            Reporter:
            Elliott Clark
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development