HBase
  1. HBase
  2. HBASE-5440

Allow Import to optionally use HFileOutputFormat

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: mapreduce
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      importtsv support importing into a life table or to generate HFiles for bulk load.
      import should allow the same.

      Could even consider merging these tools into one (in principle the only difference is the parsing part - although that is maybe for a different jira).

      1. 5440-v2.txt
        9 kB
        Lars Hofhansl
      2. 5440.txt
        9 kB
        Lars Hofhansl

        Activity

        Lars Hofhansl created issue -
        Lars Hofhansl made changes -
        Field Original Value New Value
        Description importtsv support imporing into a life table or to generate HFiles for bulk load.
        import should allow the same.

        Could even consider merging these tools into one (in principle the only difference is the parsing part - although that is maybe for a different jira).
        importtsv support importing into a life table or to generate HFiles for bulk load.
        import should allow the same.

        Could even consider merging these tools into one (in principle the only difference is the parsing part - although that is maybe for a different jira).
        Hide
        Lars Hofhansl added a comment -

        First cut.

        • a new import mapper that writes KeyValues
        • uses KeyValueSortReducer

        Only used when -Dimport.bulk.output=<path/to/output> is set.

        I did experiment with a Reducer that accepts Mutation (common super class of Put and Delete), but that caused more problems than it solved, hence the KeyValueImporter.

        Show
        Lars Hofhansl added a comment - First cut. a new import mapper that writes KeyValues uses KeyValueSortReducer Only used when -Dimport.bulk.output=<path/to/output> is set. I did experiment with a Reducer that accepts Mutation (common super class of Put and Delete), but that caused more problems than it solved, hence the KeyValueImporter.
        Lars Hofhansl made changes -
        Attachment 5440.txt [ 12515813 ]
        Lars Hofhansl made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515813/5440.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 152 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.TestAtomicOperation
        org.apache.hadoop.hbase.coprocessor.TestClassLoading
        org.apache.hadoop.hbase.mapreduce.TestImportTsv
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1027//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1027//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1027//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515813/5440.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 152 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1027//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1027//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1027//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        Ran the failed tests locally. They all pass.

        Show
        Lars Hofhansl added a comment - Ran the failed tests locally. They all pass.
        Hide
        Lars Hofhansl added a comment -

        Would review board help?
        This is actually a pretty simple change:
        import can optionally import into HFiles. In that case a new mapper and an additional reducer are used (similar to what importtsv does).

        Most of the changes are just so that code can be shared between KeyValueImporter and the existing Importer mapper classes.

        Show
        Lars Hofhansl added a comment - Would review board help? This is actually a pretty simple change: import can optionally import into HFiles. In that case a new mapper and an additional reducer are used (similar to what importtsv does). Most of the changes are just so that code can be shared between KeyValueImporter and the existing Importer mapper classes.
        Hide
        stack added a comment -

        LGTM. Whats missing is better documentation in the usage for Import. This new option will be under a rock unless its better surfaced. +1 on commit after beefing up usage. Add some lines under here:

        -    System.err.println("Usage: Import <tablename> <inputdir>");
        +    System.err.println("Usage: Import [-D" + BULK_OUTPUT_CONF_KEY
        +        + "=/path/for/output] <tablename> <inputdir>");
        

        ... going on about what the -D thingy does.

        Good stuff.

        Show
        stack added a comment - LGTM. Whats missing is better documentation in the usage for Import. This new option will be under a rock unless its better surfaced. +1 on commit after beefing up usage. Add some lines under here: - System .err.println( "Usage: Import <tablename> <inputdir>" ); + System .err.println( "Usage: Import [-D" + BULK_OUTPUT_CONF_KEY + + "=/path/ for /output] <tablename> <inputdir>" ); ... going on about what the -D thingy does. Good stuff.
        Hide
        Lars Hofhansl added a comment -

        Yeah, you're right of course
        Will do and a post a new patch soon.

        Show
        Lars Hofhansl added a comment - Yeah, you're right of course Will do and a post a new patch soon.
        Lars Hofhansl made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Lars Hofhansl added a comment -

        How about this. Same patch, just different message.

        Note that I manually tested this. I have not managed to create a test for this.
        Might think about a good test more in a separate jira.

        Show
        Lars Hofhansl added a comment - How about this. Same patch, just different message. Note that I manually tested this. I have not managed to create a test for this. Might think about a good test more in a separate jira.
        Lars Hofhansl made changes -
        Attachment 5440-v2.txt [ 12515890 ]
        Hide
        stack added a comment -

        +1 on commit.

        Show
        stack added a comment - +1 on commit.
        Lars Hofhansl made changes -
        Summary Allow import to optionally use HFileOutputFormat Allow Import to optionally use HFileOutputFormat
        Hide
        Lars Hofhansl added a comment -

        Committed to trunk. Thanks for reviewing stack!

        Show
        Lars Hofhansl added a comment - Committed to trunk. Thanks for reviewing stack!
        Lars Hofhansl made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #122 (See https://builds.apache.org/job/HBase-TRUNK-security/122/)
        HBASE-5440 Allow Import to optionally use HFileOutputFormat (Revision 1293101)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #122 (See https://builds.apache.org/job/HBase-TRUNK-security/122/ ) HBASE-5440 Allow Import to optionally use HFileOutputFormat (Revision 1293101) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2669 (See https://builds.apache.org/job/HBase-TRUNK/2669/)
        HBASE-5440 Allow Import to optionally use HFileOutputFormat (Revision 1293101)

        Result = SUCCESS
        larsh :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2669 (See https://builds.apache.org/job/HBase-TRUNK/2669/ ) HBASE-5440 Allow Import to optionally use HFileOutputFormat (Revision 1293101) Result = SUCCESS larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java
        Hide
        paul mackles added a comment -

        Thanks Lars and Stack. I actually had a chance to play around with this a bit over the weekend and it certainly suited my purposes of being able to restore in a reasonable timeframe should disaster strike. We are actually still on 0.90.4 so I backported the relevant portions of the changes to that version of Import. Happy to create a patch if folks think that might be interesting.

        Show
        paul mackles added a comment - Thanks Lars and Stack. I actually had a chance to play around with this a bit over the weekend and it certainly suited my purposes of being able to restore in a reasonable timeframe should disaster strike. We are actually still on 0.90.4 so I backported the relevant portions of the changes to that version of Import. Happy to create a patch if folks think that might be interesting.
        Hide
        Lars Hofhansl added a comment -

        Hey Paul, I am glad this is useful for you. Reducing the timeframe for recovery is exactly what I had in mind with this.
        @Stack and @Ram: Are we doing more 0.90 releases? Should we add this?

        Show
        Lars Hofhansl added a comment - Hey Paul, I am glad this is useful for you. Reducing the timeframe for recovery is exactly what I had in mind with this. @Stack and @Ram: Are we doing more 0.90 releases? Should we add this?
        Hide
        stack added a comment -

        @Lars Up to Ram. I've moved on.

        Show
        stack added a comment - @Lars Up to Ram. I've moved on.
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Lars Hofhansl
            Reporter:
            Lars Hofhansl
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development