Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-4440

add an option to presplit table to PerformanceEvaluation

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: util
    • Labels:

      Description

      PerformanceEvaluation a quick way to 'benchmark' a HBase cluster. The current 'write*' operations do not pre-split the table. Pre splitting the table will really boost the insert performance.

      It would be nice to have an option to enable pre-splitting table before the inserts begin.

      it would look something like:

      (a) hbase ...PerformanceEvaluation --presplit=10 <other options>
      (b) hbase ...PerformanceEvaluation --presplit <other options>

      (b) will try to presplit the table on some default value (say number of region servers)

      1. PerformanceEvaluation.java
        43 kB
        Sujee Maniyam
      2. PerformanceEvaluation_HBASE_4440.patch
        4 kB
        Sujee Maniyam
      3. PerformanceEvaluation_HBASE_4440_2.patch
        4 kB
        Sujee Maniyam

        Issue Links

          Activity

          Hide
          sujee Sujee Maniyam added a comment -

          linking to similar improvement request for YCSB benchmark

          Show
          sujee Sujee Maniyam added a comment - linking to similar improvement request for YCSB benchmark
          Hide
          sujee Sujee Maniyam added a comment -

          added --presplit option to pre-split table

          Show
          sujee Sujee Maniyam added a comment - added --presplit option to pre-split table
          Hide
          sujee Sujee Maniyam added a comment -

          patch

          Show
          sujee Sujee Maniyam added a comment - patch
          Hide
          sujee Sujee Maniyam added a comment -

          patch attached

          Show
          sujee Sujee Maniyam added a comment - patch attached
          Hide
          nspiegelberg Nicolas Spiegelberg added a comment -

          looks pretty good. I wish you could use RegionSplitter.HexStringSplit to generate your keys (since that would be the recommended usage pattern), but I guess PerformanceEvaluation does DecimalStrings instead. Maybe a refactoring effort?

          The line

          Thread.sleep(3000); // wait for things to settle down
          

          deleteTable() should stall until all all the regions associated with that table are removed. Did you encounter some problems without that sleep? We shouldn't have any arbitrary stalls in the code and instead understand what else we need to wait on before creating the table again.

          Show
          nspiegelberg Nicolas Spiegelberg added a comment - looks pretty good. I wish you could use RegionSplitter.HexStringSplit to generate your keys (since that would be the recommended usage pattern), but I guess PerformanceEvaluation does DecimalStrings instead. Maybe a refactoring effort? The line Thread .sleep(3000); // wait for things to settle down deleteTable() should stall until all all the regions associated with that table are removed. Did you encounter some problems without that sleep? We shouldn't have any arbitrary stalls in the code and instead understand what else we need to wait on before creating the table again.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          @Sujee

          Wait till there not regions in RIT instead of sleep like how we have done in TestMasterFailover

          log("Waiting for no more RIT");
              ZKAssign.blockUntilNoRIT(zkw);
          
          Show
          ram_krish ramkrishna.s.vasudevan added a comment - @Sujee Wait till there not regions in RIT instead of sleep like how we have done in TestMasterFailover log( "Waiting for no more RIT" ); ZKAssign.blockUntilNoRIT(zkw);
          Hide
          nspiegelberg Nicolas Spiegelberg added a comment -

          @ramkrishna: should this not be done in the deleteTable() code to make it synchronous instead of expecting all callers to know that it is necessary?

          Show
          nspiegelberg Nicolas Spiegelberg added a comment - @ramkrishna: should this not be done in the deleteTable() code to make it synchronous instead of expecting all callers to know that it is necessary?
          Hide
          sujee Sujee Maniyam added a comment -

          thanks for the comments.

          I have attached a revised patch.

          removed sleep-wait after deleting table. I had some issues with 0.90.1, works fine with 0.90.4

          Show
          sujee Sujee Maniyam added a comment - thanks for the comments. I have attached a revised patch. removed sleep-wait after deleting table. I had some issues with 0.90.1, works fine with 0.90.4
          Hide
          nspiegelberg Nicolas Spiegelberg added a comment -

          thanks Sujee!

          Show
          nspiegelberg Nicolas Spiegelberg added a comment - thanks Sujee!
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK-security #23 (See https://builds.apache.org/job/HBase-TRUNK-security/23/)
          HBASE-4440 add an option to presplit table to PerformanceEvaluation

          nspiegelberg :
          Files :

          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK-security #23 (See https://builds.apache.org/job/HBase-TRUNK-security/23/ ) HBASE-4440 add an option to presplit table to PerformanceEvaluation nspiegelberg : Files : /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK #2520 (See https://builds.apache.org/job/HBase-TRUNK/2520/)
          HBASE-4440 add an option to presplit table to PerformanceEvaluation

          nspiegelberg :
          Files :

          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK #2520 (See https://builds.apache.org/job/HBase-TRUNK/2520/ ) HBASE-4440 add an option to presplit table to PerformanceEvaluation nspiegelberg : Files : /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          ugh pressed the wrong button and assigned it to myself, I added Sujee as a contributor reassigned this jira.

          Show
          jdcryans Jean-Daniel Cryans added a comment - ugh pressed the wrong button and assigned it to myself, I added Sujee as a contributor reassigned this jira.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          I tried the patch on 0.92, why is it that using presplit you get a different behavior than while not using it regarding the re-creation of the table? Seems like a bug to me.

          Show
          jdcryans Jean-Daniel Cryans added a comment - I tried the patch on 0.92, why is it that using presplit you get a different behavior than while not using it regarding the re-creation of the table? Seems like a bug to me.
          Hide
          sujee Sujee Maniyam added a comment -

          @JD
          --presplit drops and re-creates the table with splits.
          is this what you mean?

          Show
          sujee Sujee Maniyam added a comment - @JD --presplit drops and re-creates the table with splits. is this what you mean?
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          Yes, and if you don't use it it's the old behavior of not recreating the table. Presplitting shouldn't be different.

          Show
          jdcryans Jean-Daniel Cryans added a comment - Yes, and if you don't use it it's the old behavior of not recreating the table. Presplitting shouldn't be different.
          Hide
          sujee Sujee Maniyam added a comment -

          so you are proposing that

          1) whether we use presplit option or not, table has to be recreated for all write-mode tests.

          This changes the behavior for all write-tests. Currently table is only created if it doesn't exist.

          2) or pre-split should try to split the table without re-creating it.

          Show
          sujee Sujee Maniyam added a comment - so you are proposing that 1) whether we use presplit option or not, table has to be recreated for all write-mode tests. This changes the behavior for all write-tests. Currently table is only created if it doesn't exist. 2) or pre-split should try to split the table without re-creating it.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          whether we use presplit option or not, table has to be recreated for all write-mode tests.

          No, it shouldn't be different from the default behavior of not recreating the table.

          or pre-split should try to split the table without re-creating it.

          It should not.

          Code speaks more than words, here's what I'm using for testing 0.92:

            private boolean checkTable(HBaseAdmin admin) throws IOException {
              HTableDescriptor tableDescriptor = getTableDescriptor();
              boolean tableExists = admin.tableExists(tableDescriptor.getName());
              if (!tableExists) {
                if (this.presplitRegions > 0) {
                  byte[][] splits = getSplits();
                  for (int i=0; i < splits.length; i++) {
                    LOG.debug(" split " + i + ": " + Bytes.toStringBinary(splits[i]));
                  }
                  admin.createTable(tableDescriptor, splits);
                  LOG.info ("Table created with " + this.presplitRegions + " splits");
                }
                else {
                  admin.createTable(tableDescriptor);
                  LOG.info("Table " + tableDescriptor + " created");
                }
              }
              return !tableExists;
            }
          
          Show
          jdcryans Jean-Daniel Cryans added a comment - whether we use presplit option or not, table has to be recreated for all write-mode tests. No, it shouldn't be different from the default behavior of not recreating the table. or pre-split should try to split the table without re-creating it. It should not. Code speaks more than words, here's what I'm using for testing 0.92: private boolean checkTable(HBaseAdmin admin) throws IOException { HTableDescriptor tableDescriptor = getTableDescriptor(); boolean tableExists = admin.tableExists(tableDescriptor.getName()); if (!tableExists) { if ( this .presplitRegions > 0) { byte [][] splits = getSplits(); for ( int i=0; i < splits.length; i++) { LOG.debug( " split " + i + ": " + Bytes.toStringBinary(splits[i])); } admin.createTable(tableDescriptor, splits); LOG.info ( "Table created with " + this .presplitRegions + " splits" ); } else { admin.createTable(tableDescriptor); LOG.info( "Table " + tableDescriptor + " created" ); } } return !tableExists; }
          Hide
          sujee Sujee Maniyam added a comment -

          I see. looks good.
          If the table exists, and presplit option is supplied, it will have no effect. It might mislead the user in believing the pre-split option took effect, while in fact it didn't.
          may be a WARN would suffice to notify the user?

          Show
          sujee Sujee Maniyam added a comment - I see. looks good. If the table exists, and presplit option is supplied, it will have no effect. It might mislead the user in believing the pre-split option took effect, while in fact it didn't. may be a WARN would suffice to notify the user?
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          We could show a WARN, but I don't think we would need more than that. In fact, we could always show a message when the table exists saying something like: "Using the existing $

          {tablename}

          which has $

          {X}

          regions".

          About the pre-splitting itself, it seems that it creates N+1 regions and the first one has the end key 0000000000 so it never gets data. Not a biggie, but could be fixed in another jira.

          Show
          jdcryans Jean-Daniel Cryans added a comment - We could show a WARN, but I don't think we would need more than that. In fact, we could always show a message when the table exists saying something like: "Using the existing $ {tablename} which has $ {X} regions". About the pre-splitting itself, it seems that it creates N+1 regions and the first one has the end key 0000000000 so it never gets data. Not a biggie, but could be fixed in another jira.
          Hide
          sujee Sujee Maniyam added a comment -

          sounds good. I will submit a patch.

          couple of newbie logistical questions:

          1) should I create a new patch against the trunk? the original patch is already committed in trunk / 0.94.

          2) do I leave old patch attachments in the JIRA or should I delete them (to reduce clutter)

          thanks JD

          Show
          sujee Sujee Maniyam added a comment - sounds good. I will submit a patch. couple of newbie logistical questions: 1) should I create a new patch against the trunk? the original patch is already committed in trunk / 0.94. 2) do I leave old patch attachments in the JIRA or should I delete them (to reduce clutter) thanks JD
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          Please create a new jira, carry over some of our conversations we had here to justify it, and leave this jira like it is please.

          Good stuff.

          Show
          jdcryans Jean-Daniel Cryans added a comment - Please create a new jira, carry over some of our conversations we had here to justify it, and leave this jira like it is please. Good stuff.

            People

            • Assignee:
              sujee Sujee Maniyam
              Reporter:
              sujee Sujee Maniyam
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development