Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5401

PerformanceEvaluation generates 10x the number of expected mappers

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Changes how many tasks PE runs when clients are mapreduce. Now tasks == client count. Previous we hardcoded ten tasks per client instance.

      Description

      With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 10' there are 100 mappers spawned, rather than the expected 10. The culprit appears to be the outer loop in writeInputFile which sets up 10 splits for every "asked-for client". I think the fix is just to remove that outer loop.

        Activity

        Hide
        ivarley Ian Varley added a comment -

        Hey Oliver, I see in your blog post (http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html) that you patched this issue. Mind posting that patch back here?

        Show
        ivarley Ian Varley added a comment - Hey Oliver, I see in your blog post ( http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html ) that you patched this issue. Mind posting that patch back here?
        Hide
        clehene Cosmin Lehene added a comment -

        Ian Varley, Oliver Meyn is this still an issue?

        Show
        clehene Cosmin Lehene added a comment - Ian Varley , Oliver Meyn is this still an issue?
        Hide
        clehene Cosmin Lehene added a comment -

        The outer loop is still there, but based on how it looks it doesn't seem it should be removed

        stack, Nick Dimiduk can you comment?

          /**
           * Per client, how many tasks will we run?  We divide number of rows by this number and have the
           * client do the resulting count in a map task.
           */
          static int TASKS_PER_CLIENT = 10;
        
        
        
              for (int i = 0; i < TASKS_PER_CLIENT; i++) {
                for (int j = 0; j < opts.numClientThreads; j++) {
                  TestOptions next = new TestOptions(opts);
                  next.startRow = (j * perClientRows) + (i * (perClientRows/10));
                  next.perClientRunRows = perClientRows / 10;
                  String s = MAPPER.writeValueAsString(next);
                  LOG.info("Client=" + j + ", maptask=" + i + ", input=" + s);
                  int hash = h.hash(Bytes.toBytes(s));
                  m.put(hash, s);
                }
              }
        
        
        Show
        clehene Cosmin Lehene added a comment - The outer loop is still there, but based on how it looks it doesn't seem it should be removed stack , Nick Dimiduk can you comment? /** * Per client, how many tasks will we run? We divide number of rows by this number and have the * client do the resulting count in a map task. */ static int TASKS_PER_CLIENT = 10; for ( int i = 0; i < TASKS_PER_CLIENT; i++) { for ( int j = 0; j < opts.numClientThreads; j++) { TestOptions next = new TestOptions(opts); next.startRow = (j * perClientRows) + (i * (perClientRows/10)); next.perClientRunRows = perClientRows / 10; String s = MAPPER.writeValueAsString(next); LOG.info( "Client=" + j + ", maptask=" + i + ", input=" + s); int hash = h.hash(Bytes.toBytes(s)); m.put(hash, s); } }
        Hide
        oliver_meyn Oliver Meyn added a comment -

        I've just run it against 0.98.6 (so, for the first time in 2 years) and it appears to be generating the same 10x # of splits, and therefore mappers, more than I would expect. The final rowcount looks fine though (ie hbase pe sequentialWrite 1 produces 10 mappers but total 1M rows). My original patch was just removing that outer loop, which I think would still work.

        But, I think the chances are good that I've just misunderstood something - with all those magical 10's in there I'm sure it's not a surprise to the original dev that there are 10 mappers for every 1 on the command line. Maybe it's just a documentation change needed?

        Show
        oliver_meyn Oliver Meyn added a comment - I've just run it against 0.98.6 (so, for the first time in 2 years) and it appears to be generating the same 10x # of splits, and therefore mappers, more than I would expect. The final rowcount looks fine though (ie hbase pe sequentialWrite 1 produces 10 mappers but total 1M rows). My original patch was just removing that outer loop, which I think would still work. But, I think the chances are good that I've just misunderstood something - with all those magical 10's in there I'm sure it's not a surprise to the original dev that there are 10 mappers for every 1 on the command line. Maybe it's just a documentation change needed?
        Hide
        ndimiduk Nick Dimiduk added a comment -

        I don't know why the 10x multiplier is there. I usually run in --nomapred mode, so I haven't thought about this much. If you want to work out a patch we can get it committed.

        Show
        ndimiduk Nick Dimiduk added a comment - I don't know why the 10x multiplier is there. I usually run in --nomapred mode, so I haven't thought about this much. If you want to work out a patch we can get it committed.
        Hide
        easyliangjob Yi Liang added a comment -

        I have used this command and also encounter this issue, for example:
        when I run hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=m randomWrite n

        if we use --nomapred, this will create n threads(clients) and each thread write m/n rows into hbase
        if we use default mapreduce, this will create 10*n mappers, and each mapper will put m/(n*10) rows into hbase.
        I think the static int

        static int TASKS_PER_CLIENT = 10

        here is unnecessary,
        1. If user want more mappers they can just change client numbers, however, if *10 is here, user can only create 10, 20, 30... mappers for different number of client, this is not flexible.
        2. The TASKS_PER_CLIENT = 10 is hardcoded and invisible to user, sometime may be user just want 5 mappers for their job, and current code will create 50 mappers.
        3. when <nclients> = 5, it means 5 threads and 50 mappers, which is a little inconsistent, PS. I do not mean mapper is same as thread, but it is better to keep them same.

        What do you guys think?

        Show
        easyliangjob Yi Liang added a comment - I have used this command and also encounter this issue, for example: when I run hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=m randomWrite n if we use --nomapred, this will create n threads(clients) and each thread write m/n rows into hbase if we use default mapreduce, this will create 10*n mappers, and each mapper will put m/(n*10) rows into hbase. I think the static int static int TASKS_PER_CLIENT = 10 here is unnecessary, 1. If user want more mappers they can just change client numbers, however, if *10 is here, user can only create 10, 20, 30... mappers for different number of client, this is not flexible. 2. The TASKS_PER_CLIENT = 10 is hardcoded and invisible to user, sometime may be user just want 5 mappers for their job, and current code will create 50 mappers. 3. when <nclients> = 5, it means 5 threads and 50 mappers, which is a little inconsistent, PS. I do not mean mapper is same as thread, but it is better to keep them same. What do you guys think?
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 12m 43s Docker mode activated.
        +1 hbaseanti 0m 0s Patch does not have any anti-patterns.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 2m 53s master passed
        +1 compile 0m 32s master passed
        +1 checkstyle 0m 39s master passed
        +1 mvneclipse 0m 13s master passed
        +1 findbugs 1m 32s master passed
        +1 javadoc 0m 27s master passed
        +1 mvninstall 0m 37s the patch passed
        +1 compile 0m 32s the patch passed
        +1 javac 0m 32s the patch passed
        +1 checkstyle 0m 38s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 hadoopcheck 23m 24s Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1.
        +1 findbugs 1m 40s the patch passed
        +1 javadoc 0m 24s the patch passed
        -1 unit 91m 24s hbase-server in the patch failed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        138m 28s



        Subsystem Report/Notes
        Docker Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844170/HBASE-5401-V1.patch
        JIRA Issue HBASE-5401
        Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile
        uname Linux 9e7d8eac1e37 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
        git revision master / 06b67a6
        Default Java 1.8.0_111
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HBASE-Build/5006/artifact/patchprocess/patch-unit-hbase-server.txt
        Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/5006/testReport/
        modules C: hbase-server U: hbase-server
        Console output https://builds.apache.org/job/PreCommit-HBASE-Build/5006/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 12m 43s Docker mode activated. +1 hbaseanti 0m 0s Patch does not have any anti-patterns. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 2m 53s master passed +1 compile 0m 32s master passed +1 checkstyle 0m 39s master passed +1 mvneclipse 0m 13s master passed +1 findbugs 1m 32s master passed +1 javadoc 0m 27s master passed +1 mvninstall 0m 37s the patch passed +1 compile 0m 32s the patch passed +1 javac 0m 32s the patch passed +1 checkstyle 0m 38s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 hadoopcheck 23m 24s Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. +1 findbugs 1m 40s the patch passed +1 javadoc 0m 24s the patch passed -1 unit 91m 24s hbase-server in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 138m 28s Subsystem Report/Notes Docker Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844170/HBASE-5401-V1.patch JIRA Issue HBASE-5401 Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile uname Linux 9e7d8eac1e37 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision master / 06b67a6 Default Java 1.8.0_111 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HBASE-Build/5006/artifact/patchprocess/patch-unit-hbase-server.txt Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/5006/testReport/ modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/5006/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        stack stack added a comment -

        Pushed. Makes sense. This baffled you and Oliver. Thats enough. Thanks for the patch Yi Liang

        Show
        stack stack added a comment - Pushed. Makes sense. This baffled you and Oliver. Thats enough. Thanks for the patch Yi Liang
        Hide
        stack stack added a comment -

        Marked it incompatible change.

        Show
        stack stack added a comment - Marked it incompatible change.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2174 (See https://builds.apache.org/job/HBase-Trunk_matrix/2174/)
        HBASE-5401 PerformanceEvaluation generates 10x the number of expected (stack: rev d787155fd24c576b66663220372dbb7286d5e291)

        • (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
        • (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/TestPerformanceEvaluation.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2174 (See https://builds.apache.org/job/HBase-Trunk_matrix/2174/ ) HBASE-5401 PerformanceEvaluation generates 10x the number of expected (stack: rev d787155fd24c576b66663220372dbb7286d5e291) (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/TestPerformanceEvaluation.java

          People

          • Assignee:
            easyliangjob Yi Liang
            Reporter:
            oliver@mineallmeyn.com Oliver Meyn
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development