HBase
  1. HBase
  2. HBASE-5166

MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New MultiThreadedTableMapper facility

      Description

      There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
      UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
      Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

      Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

      1. 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
        9 kB
        Jai Kumar Singh
      2. 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
        8 kB
        Jai Kumar Singh
      3. 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
        31 kB
        Jai Kumar Singh
      4. 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
        19 kB
        Jai Kumar Singh
      5. 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
        18 kB
        Jai Kumar Singh
      6. 5166-v9.txt
        18 kB
        stack

        Activity

        Hide
        Jai Kumar Singh added a comment -

        This is the implementation I am using currently for Multithreadedtablemapper which is a modification of MultithreadedMapper from org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.java

        Show
        Jai Kumar Singh added a comment - This is the implementation I am using currently for Multithreadedtablemapper which is a modification of MultithreadedMapper from org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.java
        Hide
        stack added a comment -

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Looks grand to me (as does the network/io-bound justification in your usecase). Would be a nice contrib. I'd like it so I can use it putting up load on hbase; currently have to run a ridiculous amount of concurrent mappers putting up a load using a tool like PerformanceEvaluation which runs a single client doing serial load per map task.

        A few comments on the patch.

        No need of these lines:

        + * Copyright 2007 The Apache Software Foundation
        

        In our code base, we use two spaces for tabs (no hard tabs you have in your file).

        Fix the name of this config:

        +				getInt("mapred.map.multithreadedrunner.threads", 10);
        

        Ditto for the setter.

        You don't want to use an executor and something like guava's utility creating the executor running the threads? (See hbase code base for examples)

        Show
        stack added a comment - Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Looks grand to me (as does the network/io-bound justification in your usecase). Would be a nice contrib. I'd like it so I can use it putting up load on hbase; currently have to run a ridiculous amount of concurrent mappers putting up a load using a tool like PerformanceEvaluation which runs a single client doing serial load per map task. A few comments on the patch. No need of these lines: + * Copyright 2007 The Apache Software Foundation In our code base, we use two spaces for tabs (no hard tabs you have in your file). Fix the name of this config: + getInt( "mapred.map.multithreadedrunner.threads" , 10); Ditto for the setter. You don't want to use an executor and something like guava's utility creating the executor running the threads? (See hbase code base for examples)
        Hide
        Jai Kumar Singh added a comment -

        Hi stack,
        Thanks for the comment. I've modified the patch accordingly.
        Added Executors.newFixedThreadPool(numberOfThreads) for executor part.

        – JK

        Show
        Jai Kumar Singh added a comment - Hi stack, Thanks for the comment. I've modified the patch accordingly. Added Executors.newFixedThreadPool(numberOfThreads) for executor part. – JK
        Hide
        Jai Kumar Singh added a comment -

        Modified patch

        Show
        Jai Kumar Singh added a comment - Modified patch
        Hide
        Jai Kumar Singh added a comment -

        Any comments ??

        Show
        Jai Kumar Singh added a comment - Any comments ??
        Hide
        Ted Yu added a comment -

        MultithreadedTableMapper misses Apache license

        +    while(!executor.isTerminated()){
        +      // wait till all the threads are done
        +    }
        

        We should put sleep() in the above loop and possibly limit the total duration of wait.

        A new unit test should be added for MultithreadedTableMapper.
        Please look at tests that use TableMapper.

        Show
        Ted Yu added a comment - MultithreadedTableMapper misses Apache license + while (!executor.isTerminated()){ + // wait till all the threads are done + } We should put sleep() in the above loop and possibly limit the total duration of wait. A new unit test should be added for MultithreadedTableMapper. Please look at tests that use TableMapper.
        Hide
        Jai Kumar Singh added a comment -

        Added Thread.sleep() and license thing and testcase

        Show
        Jai Kumar Singh added a comment - Added Thread.sleep() and license thing and testcase
        Hide
        Jai Kumar Singh added a comment -

        @Zhihong Yu, 1) Apache License was earlier there but I removed that become stack suggested so. Anyway, I'd put it back.
        2) I've added Thread.sleep(1000). I am not sure whether we want to limit the wait duration, wouldn't that depend on kind of job we are running ?
        3) I've modified the test case of TableMapper in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java
        Firstly, I was going to make a new testcase file for MultithreadedTableMapper but it does not make sense in doing so, because that would be too much code repetition.
        So, I added a numOfThreads argument in TestTableMapReduce's runTestOnTable function and called the function twice. Check patch for more details.

        Show
        Jai Kumar Singh added a comment - @Zhihong Yu, 1) Apache License was earlier there but I removed that become stack suggested so. Anyway, I'd put it back. 2) I've added Thread.sleep(1000). I am not sure whether we want to limit the wait duration, wouldn't that depend on kind of job we are running ? 3) I've modified the test case of TableMapper in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java Firstly, I was going to make a new testcase file for MultithreadedTableMapper but it does not make sense in doing so, because that would be too much code repetition. So, I added a numOfThreads argument in TestTableMapReduce's runTestOnTable function and called the function twice. Check patch for more details.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515340/0005-HBASE-5166-Added-MultithreadedTableMapper.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/997//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515340/0005-HBASE-5166-Added-MultithreadedTableMapper.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/997//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Can you use --no-prefix to generat a new patch for Hadoop QA ?

        Thanks

        Show
        Ted Yu added a comment - Can you use --no-prefix to generat a new patch for Hadoop QA ? Thanks
        Hide
        Jai Kumar Singh added a comment -

        patch created against current trunk.
        Also moved the testcase in separate file.

        Show
        Jai Kumar Singh added a comment - patch created against current trunk. Also moved the testcase in separate file.
        Hide
        Jai Kumar Singh added a comment -

        submitted a new patch against current trunk on svn.

        Thanks

        Show
        Jai Kumar Singh added a comment - submitted a new patch against current trunk on svn. Thanks
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515348/0006-HBASE-5166-Added-MultithreadedTableMapper.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -134 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.TestAtomicOperation
        org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
        org.apache.hadoop.hbase.mapreduce.TestImportTsv
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/998//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/998//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/998//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515348/0006-HBASE-5166-Added-MultithreadedTableMapper.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -134 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/998//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/998//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/998//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        @Jai:

        + * Copyright 2007 The Apache Software Foundation
        

        Year is not needed in license header. Same here:

        + * Copyright 2009 The Apache Software Foundation
        
        +  public void testAddDependencyJars() throws Exception {
        

        The above doesn't carry @Test annotation. If it is not needed for this JIRA, please remove it.

        +  public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedrunner.class";
        

        I think the name of config parameter should be changed to 'multithreadedmapper.class'
        Same for NUMBER_OF_THREADS

        +  private class SubMapRecordReader extends RecordReader<ImmutableBytesWritable, Result> {
        

        Why do we need the Sub prefix above ?

        Putting the patch on https://reviews.apache.org would make review process smooth.

        Show
        Ted Yu added a comment - @Jai: + * Copyright 2007 The Apache Software Foundation Year is not needed in license header. Same here: + * Copyright 2009 The Apache Software Foundation + public void testAddDependencyJars() throws Exception { The above doesn't carry @Test annotation. If it is not needed for this JIRA, please remove it. + public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedrunner.class" ; I think the name of config parameter should be changed to 'multithreadedmapper.class' Same for NUMBER_OF_THREADS + private class SubMapRecordReader extends RecordReader<ImmutableBytesWritable, Result> { Why do we need the Sub prefix above ? Putting the patch on https://reviews.apache.org would make review process smooth.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/
        -----------------------------------------------------------

        Review request for Michael Stack.

        Summary
        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        This addresses bug HBASE-5166.
        https://issues.apache.org/jira/browse/HBASE-5166

        Diffs


        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing
        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- Review request for Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. This addresses bug HBASE-5166 . https://issues.apache.org/jira/browse/HBASE-5166 Diffs /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        Jai Kumar Singh added a comment -

        @Zhihong Yu: submitted the patch for review with the suggested changes.
        For the sub prefix, I've taken this from hadoop and following the same. Reason why we are calling it SubMapRecordReader/Writer because it is intermediate RecordReader/Writer for Mapper Threads and It eventually uses RecordReader/Writer passed to MapReduce Job to do actual read/write.

        Thanks,

        PS: I tried adding "Zhihong" in the reviewer list on the review page but somehow RB was failing, So I added stack as reviewer. Please do review.

        Show
        Jai Kumar Singh added a comment - @Zhihong Yu: submitted the patch for review with the suggested changes. For the sub prefix, I've taken this from hadoop and following the same. Reason why we are calling it SubMapRecordReader/Writer because it is intermediate RecordReader/Writer for Mapper Threads and It eventually uses RecordReader/Writer passed to MapReduce Job to do actual read/write. Thanks, PS: I tried adding "Zhihong" in the reviewer list on the review page but somehow RB was failing, So I added stack as reviewer. Please do review.
        Hide
        Ted Yu added a comment -

        My recommendation of using review board is to leave Bugs field empty. Otherwise large amount of post-back from review board would appear in the JIRA.
        You can specify hbase in Groups field.

        My user name is tedyu.

        Show
        Ted Yu added a comment - My recommendation of using review board is to leave Bugs field empty. Otherwise large amount of post-back from review board would appear in the JIRA. You can specify hbase in Groups field. My user name is tedyu.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5266
        -----------------------------------------------------------

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
        <https://reviews.apache.org/r/3995/#comment11506>

        "hbase.mapreduce." prefix should be kept.
        Would "hbase.mapreduce.multithreadedmapper.class" be a good name ?

        • Ted

        On 2012-02-22 03:22:25, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-22 03:22:25)

        Review request for Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        This addresses bug HBASE-5166.

        https://issues.apache.org/jira/browse/HBASE-5166

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5266 ----------------------------------------------------------- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java < https://reviews.apache.org/r/3995/#comment11506 > "hbase.mapreduce." prefix should be kept. Would "hbase.mapreduce.multithreadedmapper.class" be a good name ? Ted On 2012-02-22 03:22:25, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-22 03:22:25) Review request for Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. This addresses bug HBASE-5166 . https://issues.apache.org/jira/browse/HBASE-5166 Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/
        -----------------------------------------------------------

        (Updated 2012-02-22 06:00:23.473596)

        Review request for hbase and Michael Stack.

        Summary
        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        This addresses bug HBASE-5166.
        https://issues.apache.org/jira/browse/HBASE-5166

        Diffs


        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing
        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-22 06:00:23.473596) Review request for hbase and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. This addresses bug HBASE-5166 . https://issues.apache.org/jira/browse/HBASE-5166 Diffs /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/
        -----------------------------------------------------------

        (Updated 2012-02-22 07:18:48.273758)

        Review request for hbase and Michael Stack.

        Changes
        -------

        Removing bugid HBASE-5166

        Summary
        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs


        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing
        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-22 07:18:48.273758) Review request for hbase and Michael Stack. Changes ------- Removing bugid HBASE-5166 Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/
        -----------------------------------------------------------

        (Updated 2012-02-22 07:20:13.121177)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary
        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs


        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing
        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-22 07:20:13.121177) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-02-22 05:26:10, Ted Yu wrote:

        > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 64

        > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line64>

        >

        > "hbase.mapreduce." prefix should be kept.

        > Would "hbase.mapreduce.multithreadedmapper.class" be a good name ?

        Okay!
        I guess than it should be "hbase.mapreduce.multithreadedtablemapper".

        public static final String NUMBER_OF_THREADS = "hbase.mapreduce.multithreadedtablemapper.threads";
        public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedtablemapper.mapclass";

        • Jai

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5266
        -----------------------------------------------------------

        On 2012-02-22 07:20:13, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-22 07:20:13)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-02-22 05:26:10, Ted Yu wrote: > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 64 > < https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line64 > > > "hbase.mapreduce." prefix should be kept. > Would "hbase.mapreduce.multithreadedmapper.class" be a good name ? Okay! I guess than it should be "hbase.mapreduce.multithreadedtablemapper". public static final String NUMBER_OF_THREADS = "hbase.mapreduce.multithreadedtablemapper.threads"; public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedtablemapper.mapclass"; Jai ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5266 ----------------------------------------------------------- On 2012-02-22 07:20:13, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-22 07:20:13) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5268
        -----------------------------------------------------------

        Quite a few white spaces need to be removed.

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
        <https://reviews.apache.org/r/3995/#comment11536>

        Should read 'MultithreadedTableMapper instances'

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
        <https://reviews.apache.org/r/3995/#comment11508>

        Leave a space between while and (
        Another space between ) and {

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
        <https://reviews.apache.org/r/3995/#comment11537>

        Can we give better progress information here ?

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
        <https://reviews.apache.org/r/3995/#comment11535>

        Long line, please wrap to 80 chars.

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
        <https://reviews.apache.org/r/3995/#comment11534>

        This if block can be an else to the if block above.

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
        <https://reviews.apache.org/r/3995/#comment11533>

        Please remove white space.

        • Ted

        On 2012-02-22 07:20:13, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-22 07:20:13)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5268 ----------------------------------------------------------- Quite a few white spaces need to be removed. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java < https://reviews.apache.org/r/3995/#comment11536 > Should read 'MultithreadedTableMapper instances' /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java < https://reviews.apache.org/r/3995/#comment11508 > Leave a space between while and ( Another space between ) and { /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java < https://reviews.apache.org/r/3995/#comment11537 > Can we give better progress information here ? /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java < https://reviews.apache.org/r/3995/#comment11535 > Long line, please wrap to 80 chars. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java < https://reviews.apache.org/r/3995/#comment11534 > This if block can be an else to the if block above. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java < https://reviews.apache.org/r/3995/#comment11533 > Please remove white space. Ted On 2012-02-22 07:20:13, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-22 07:20:13) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-02-22 17:53:12, Ted Yu wrote:

        > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 114

        > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line114>

        >

        > Should read 'MultithreadedTableMapper instances'

        done!

        On 2012-02-22 17:53:12, Ted Yu wrote:

        > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 155

        > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line155>

        >

        > Can we give better progress information here ?

        I am not sure how to do it. It would be possible if I can access underlying RecorderReaader/Writer passed to jobContext and simply calling there getProgress. Could anybody help me here ?

        On 2012-02-22 17:53:12, Ted Yu wrote:

        > /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java, line 223

        > <https://reviews.apache.org/r/3995/diff/2/?file=78620#file78620line223>

        >

        > This if block can be an else to the if block above.

        done

        • Jai

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5268
        -----------------------------------------------------------

        On 2012-02-23 04:17:08, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-23 04:17:08)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-02-22 17:53:12, Ted Yu wrote: > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 114 > < https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line114 > > > Should read 'MultithreadedTableMapper instances' done! On 2012-02-22 17:53:12, Ted Yu wrote: > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 155 > < https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line155 > > > Can we give better progress information here ? I am not sure how to do it. It would be possible if I can access underlying RecorderReaader/Writer passed to jobContext and simply calling there getProgress. Could anybody help me here ? On 2012-02-22 17:53:12, Ted Yu wrote: > /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java, line 223 > < https://reviews.apache.org/r/3995/diff/2/?file=78620#file78620line223 > > > This if block can be an else to the if block above. done Jai ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5268 ----------------------------------------------------------- On 2012-02-23 04:17:08, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:17:08) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/
        -----------------------------------------------------------

        (Updated 2012-02-23 04:17:08.702062)

        Review request for hbase, Ted Yu and Michael Stack.

        Changes
        -------

        changes as suggested in review

        Summary
        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs (updated)


        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing
        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:17:08.702062) Review request for hbase, Ted Yu and Michael Stack. Changes ------- changes as suggested in review Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs (updated) /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/
        -----------------------------------------------------------

        (Updated 2012-02-23 04:22:51.078969)

        Review request for hbase, Ted Yu and Michael Stack.

        Changes
        -------

        White spaces remove

        Summary
        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs (updated)


        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing
        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:22:51.078969) Review request for hbase, Ted Yu and Michael Stack. Changes ------- White spaces remove Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs (updated) /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5302
        -----------------------------------------------------------

        Ship it!

        This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.

        • Michael

        On 2012-02-23 04:22:51, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-23 04:22:51)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 ----------------------------------------------------------- Ship it! This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. Michael On 2012-02-23 04:22:51, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:22:51) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-02-23 04:32:03, Michael Stack wrote:

        > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.

        This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.

        • Jai

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5302
        -----------------------------------------------------------

        On 2012-02-23 04:22:51, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-23 04:22:51)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-02-23 04:32:03, Michael Stack wrote: > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166 . Jai ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 ----------------------------------------------------------- On 2012-02-23 04:22:51, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:22:51) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-02-23 04:32:03, Michael Stack wrote:

        > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.

        Jai Singh wrote:

        This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.

        So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it.

        • Michael

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5302
        -----------------------------------------------------------

        On 2012-02-23 04:22:51, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-23 04:22:51)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-02-23 04:32:03, Michael Stack wrote: > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. Jai Singh wrote: This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166 . So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it. Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 ----------------------------------------------------------- On 2012-02-23 04:22:51, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:22:51) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-02-23 04:32:03, Michael Stack wrote:

        > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.

        Jai Singh wrote:

        This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.

        Michael Stack wrote:

        So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it.

        Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai.

        • Michael

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5302
        -----------------------------------------------------------

        On 2012-02-23 04:22:51, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-23 04:22:51)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-02-23 04:32:03, Michael Stack wrote: > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. Jai Singh wrote: This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166 . Michael Stack wrote: So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it. Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai. Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 ----------------------------------------------------------- On 2012-02-23 04:22:51, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:22:51) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        Jai Kumar Singh added a comment -

        updating latest patch
        thanks

        Show
        Jai Kumar Singh added a comment - updating latest patch thanks
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-02-23 04:32:03, Michael Stack wrote:

        > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.

        Jai Singh wrote:

        This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.

        Michael Stack wrote:

        So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it.

        Michael Stack wrote:

        Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai.

        Yes, It works great with web crawling scenario.

        "MultiThreadedTableMapper for [N/W] IO bound jobs"

        Updated the patch on jira.

        Thanks

        • Jai

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3995/#review5302
        -----------------------------------------------------------

        On 2012-02-23 04:22:51, Jai Singh wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3995/

        -----------------------------------------------------------

        (Updated 2012-02-23 04:22:51)

        Review request for hbase, Ted Yu and Michael Stack.

        Summary

        -------

        There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.

        UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.

        Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

        Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

        Diffs

        -----

        /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION

        /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION

        Diff: https://reviews.apache.org/r/3995/diff

        Testing

        -------

        Thanks,

        Jai

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-02-23 04:32:03, Michael Stack wrote: > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load. Jai Singh wrote: This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166 . Michael Stack wrote: So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it. Michael Stack wrote: Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai. Yes, It works great with web crawling scenario. "MultiThreadedTableMapper for [N/W] IO bound jobs" Updated the patch on jira. Thanks Jai ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5302 ----------------------------------------------------------- On 2012-02-23 04:22:51, Jai Singh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/ ----------------------------------------------------------- (Updated 2012-02-23 04:22:51) Review request for hbase, Ted Yu and Michael Stack. Summary ------- There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. Diffs ----- /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION Diff: https://reviews.apache.org/r/3995/diff Testing ------- Thanks, Jai
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515712/0008-HBASE-5166-Added-MultithreadedTableMapper.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -134 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.replication.TestReplicationPeer
        org.apache.hadoop.hbase.replication.TestReplication
        org.apache.hadoop.hbase.TestDrainingServer
        org.apache.hadoop.hbase.mapreduce.TestImportTsv
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515712/0008-HBASE-5166-Added-MultithreadedTableMapper.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -134 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplicationPeer org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//console This message is automatically generated.
        Hide
        stack added a comment -

        Same as 0008. Uploading again to rerun hadoopqa. Shouldn't be failing that many tests with this patch.

        Show
        stack added a comment - Same as 0008. Uploading again to rerun hadoopqa. Shouldn't be failing that many tests with this patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12515764/5166-v9.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -134 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestImportTsv

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515764/5166-v9.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -134 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//console This message is automatically generated.
        Hide
        Jai Kumar Singh added a comment -

        @stack,ted: any idea why its failing these tests ?

        Show
        Jai Kumar Singh added a comment - @stack,ted: any idea why its failing these tests ?
        Hide
        stack added a comment -

        @Jai Its not you. Those are known failing tests. Let me commit.

        Show
        stack added a comment - @Jai Its not you. Those are known failing tests. Let me commit.
        Hide
        stack added a comment -

        Committed to trunk. Thanks for the patch Jai. Nice one.

        Show
        stack added a comment - Committed to trunk. Thanks for the patch Jai. Nice one.
        Hide
        Jai Kumar Singh added a comment -

        thanks stack, ted

        Show
        Jai Kumar Singh added a comment - thanks stack, ted
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #122 (See https://builds.apache.org/job/HBase-TRUNK-security/122/)
        HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #122 (See https://builds.apache.org/job/HBase-TRUNK-security/122/ ) HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2669 (See https://builds.apache.org/job/HBase-TRUNK/2669/)
        HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2669 (See https://builds.apache.org/job/HBase-TRUNK/2669/ ) HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
        Hide
        Jean-Daniel Cryans added a comment -

        Added Jai as a contributor and assigned the Jira.

        Show
        Jean-Daniel Cryans added a comment - Added Jai as a contributor and assigned the Jira.

          People

          • Assignee:
            Jai Kumar Singh
            Reporter:
            Jai Kumar Singh
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.5h
              0.5h
              Remaining:
              Remaining Estimate - 0.5h
              0.5h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development