Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-4440

add an option to presplit table to PerformanceEvaluation

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.94.0
    • util

    Description

      PerformanceEvaluation a quick way to 'benchmark' a HBase cluster. The current 'write*' operations do not pre-split the table. Pre splitting the table will really boost the insert performance.

      It would be nice to have an option to enable pre-splitting table before the inserts begin.

      it would look something like:

      (a) hbase ...PerformanceEvaluation --presplit=10 <other options>
      (b) hbase ...PerformanceEvaluation --presplit <other options>

      (b) will try to presplit the table on some default value (say number of region servers)

      Attachments

        1. PerformanceEvaluation_HBASE_4440.patch
          4 kB
          Sujee Maniyam
        2. PerformanceEvaluation.java
          43 kB
          Sujee Maniyam
        3. PerformanceEvaluation_HBASE_4440_2.patch
          4 kB
          Sujee Maniyam

        Issue Links

          Activity

            sujee Sujee Maniyam added a comment -

            linking to similar improvement request for YCSB benchmark

            sujee Sujee Maniyam added a comment - linking to similar improvement request for YCSB benchmark
            sujee Sujee Maniyam added a comment -

            added --presplit option to pre-split table

            sujee Sujee Maniyam added a comment - added --presplit option to pre-split table
            sujee Sujee Maniyam added a comment -

            patch

            sujee Sujee Maniyam added a comment - patch
            sujee Sujee Maniyam added a comment -

            patch attached

            sujee Sujee Maniyam added a comment - patch attached

            looks pretty good. I wish you could use RegionSplitter.HexStringSplit to generate your keys (since that would be the recommended usage pattern), but I guess PerformanceEvaluation does DecimalStrings instead. Maybe a refactoring effort?

            The line

            Thread.sleep(3000); // wait for things to settle down
            

            deleteTable() should stall until all all the regions associated with that table are removed. Did you encounter some problems without that sleep? We shouldn't have any arbitrary stalls in the code and instead understand what else we need to wait on before creating the table again.

            nspiegelberg Nicolas Spiegelberg added a comment - looks pretty good. I wish you could use RegionSplitter.HexStringSplit to generate your keys (since that would be the recommended usage pattern), but I guess PerformanceEvaluation does DecimalStrings instead. Maybe a refactoring effort? The line Thread .sleep(3000); // wait for things to settle down deleteTable() should stall until all all the regions associated with that table are removed. Did you encounter some problems without that sleep? We shouldn't have any arbitrary stalls in the code and instead understand what else we need to wait on before creating the table again.

            @Sujee

            Wait till there not regions in RIT instead of sleep like how we have done in TestMasterFailover

            log("Waiting for no more RIT");
                ZKAssign.blockUntilNoRIT(zkw);
            
            ram_krish ramkrishna.s.vasudevan added a comment - @Sujee Wait till there not regions in RIT instead of sleep like how we have done in TestMasterFailover log( "Waiting for no more RIT" ); ZKAssign.blockUntilNoRIT(zkw);

            @ramkrishna: should this not be done in the deleteTable() code to make it synchronous instead of expecting all callers to know that it is necessary?

            nspiegelberg Nicolas Spiegelberg added a comment - @ramkrishna: should this not be done in the deleteTable() code to make it synchronous instead of expecting all callers to know that it is necessary?
            sujee Sujee Maniyam added a comment -

            thanks for the comments.

            I have attached a revised patch.

            removed sleep-wait after deleting table. I had some issues with 0.90.1, works fine with 0.90.4

            sujee Sujee Maniyam added a comment - thanks for the comments. I have attached a revised patch. removed sleep-wait after deleting table. I had some issues with 0.90.1, works fine with 0.90.4

            thanks Sujee!

            nspiegelberg Nicolas Spiegelberg added a comment - thanks Sujee!
            hudson Hudson added a comment -

            Integrated in HBase-TRUNK-security #23 (See https://builds.apache.org/job/HBase-TRUNK-security/23/)
            HBASE-4440 add an option to presplit table to PerformanceEvaluation

            nspiegelberg :
            Files :

            • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
            hudson Hudson added a comment - Integrated in HBase-TRUNK-security #23 (See https://builds.apache.org/job/HBase-TRUNK-security/23/ ) HBASE-4440 add an option to presplit table to PerformanceEvaluation nspiegelberg : Files : /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
            hudson Hudson added a comment -

            Integrated in HBase-TRUNK #2520 (See https://builds.apache.org/job/HBase-TRUNK/2520/)
            HBASE-4440 add an option to presplit table to PerformanceEvaluation

            nspiegelberg :
            Files :

            • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
            hudson Hudson added a comment - Integrated in HBase-TRUNK #2520 (See https://builds.apache.org/job/HBase-TRUNK/2520/ ) HBASE-4440 add an option to presplit table to PerformanceEvaluation nspiegelberg : Files : /hbase/trunk/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java

            ugh pressed the wrong button and assigned it to myself, I added Sujee as a contributor reassigned this jira.

            jdcryans Jean-Daniel Cryans added a comment - ugh pressed the wrong button and assigned it to myself, I added Sujee as a contributor reassigned this jira.

            I tried the patch on 0.92, why is it that using presplit you get a different behavior than while not using it regarding the re-creation of the table? Seems like a bug to me.

            jdcryans Jean-Daniel Cryans added a comment - I tried the patch on 0.92, why is it that using presplit you get a different behavior than while not using it regarding the re-creation of the table? Seems like a bug to me.
            sujee Sujee Maniyam added a comment -

            @JD
            --presplit drops and re-creates the table with splits.
            is this what you mean?

            sujee Sujee Maniyam added a comment - @JD --presplit drops and re-creates the table with splits. is this what you mean?

            Yes, and if you don't use it it's the old behavior of not recreating the table. Presplitting shouldn't be different.

            jdcryans Jean-Daniel Cryans added a comment - Yes, and if you don't use it it's the old behavior of not recreating the table. Presplitting shouldn't be different.
            sujee Sujee Maniyam added a comment -

            so you are proposing that

            1) whether we use presplit option or not, table has to be recreated for all write-mode tests.

            This changes the behavior for all write-tests. Currently table is only created if it doesn't exist.

            2) or pre-split should try to split the table without re-creating it.

            sujee Sujee Maniyam added a comment - so you are proposing that 1) whether we use presplit option or not, table has to be recreated for all write-mode tests. This changes the behavior for all write-tests. Currently table is only created if it doesn't exist. 2) or pre-split should try to split the table without re-creating it.

            whether we use presplit option or not, table has to be recreated for all write-mode tests.

            No, it shouldn't be different from the default behavior of not recreating the table.

            or pre-split should try to split the table without re-creating it.

            It should not.

            Code speaks more than words, here's what I'm using for testing 0.92:

              private boolean checkTable(HBaseAdmin admin) throws IOException {
                HTableDescriptor tableDescriptor = getTableDescriptor();
                boolean tableExists = admin.tableExists(tableDescriptor.getName());
                if (!tableExists) {
                  if (this.presplitRegions > 0) {
                    byte[][] splits = getSplits();
                    for (int i=0; i < splits.length; i++) {
                      LOG.debug(" split " + i + ": " + Bytes.toStringBinary(splits[i]));
                    }
                    admin.createTable(tableDescriptor, splits);
                    LOG.info ("Table created with " + this.presplitRegions + " splits");
                  }
                  else {
                    admin.createTable(tableDescriptor);
                    LOG.info("Table " + tableDescriptor + " created");
                  }
                }
                return !tableExists;
              }
            
            jdcryans Jean-Daniel Cryans added a comment - whether we use presplit option or not, table has to be recreated for all write-mode tests. No, it shouldn't be different from the default behavior of not recreating the table. or pre-split should try to split the table without re-creating it. It should not. Code speaks more than words, here's what I'm using for testing 0.92: private boolean checkTable(HBaseAdmin admin) throws IOException { HTableDescriptor tableDescriptor = getTableDescriptor(); boolean tableExists = admin.tableExists(tableDescriptor.getName()); if (!tableExists) { if ( this .presplitRegions > 0) { byte [][] splits = getSplits(); for ( int i=0; i < splits.length; i++) { LOG.debug( " split " + i + ": " + Bytes.toStringBinary(splits[i])); } admin.createTable(tableDescriptor, splits); LOG.info ( "Table created with " + this .presplitRegions + " splits" ); } else { admin.createTable(tableDescriptor); LOG.info( "Table " + tableDescriptor + " created" ); } } return !tableExists; }
            sujee Sujee Maniyam added a comment -

            I see. looks good.
            If the table exists, and presplit option is supplied, it will have no effect. It might mislead the user in believing the pre-split option took effect, while in fact it didn't.
            may be a WARN would suffice to notify the user?

            sujee Sujee Maniyam added a comment - I see. looks good. If the table exists, and presplit option is supplied, it will have no effect. It might mislead the user in believing the pre-split option took effect, while in fact it didn't. may be a WARN would suffice to notify the user?

            We could show a WARN, but I don't think we would need more than that. In fact, we could always show a message when the table exists saying something like: "Using the existing ${tablename} which has ${X} regions".

            About the pre-splitting itself, it seems that it creates N+1 regions and the first one has the end key 0000000000 so it never gets data. Not a biggie, but could be fixed in another jira.

            jdcryans Jean-Daniel Cryans added a comment - We could show a WARN, but I don't think we would need more than that. In fact, we could always show a message when the table exists saying something like: "Using the existing ${tablename} which has ${X} regions". About the pre-splitting itself, it seems that it creates N+1 regions and the first one has the end key 0000000000 so it never gets data. Not a biggie, but could be fixed in another jira.
            sujee Sujee Maniyam added a comment -

            sounds good. I will submit a patch.

            couple of newbie logistical questions:

            1) should I create a new patch against the trunk? the original patch is already committed in trunk / 0.94.

            2) do I leave old patch attachments in the JIRA or should I delete them (to reduce clutter)

            thanks JD

            sujee Sujee Maniyam added a comment - sounds good. I will submit a patch. couple of newbie logistical questions: 1) should I create a new patch against the trunk? the original patch is already committed in trunk / 0.94. 2) do I leave old patch attachments in the JIRA or should I delete them (to reduce clutter) thanks JD

            Please create a new jira, carry over some of our conversations we had here to justify it, and leave this jira like it is please.

            Good stuff.

            jdcryans Jean-Daniel Cryans added a comment - Please create a new jira, carry over some of our conversations we had here to justify it, and leave this jira like it is please. Good stuff.

            People

              sujee Sujee Maniyam
              sujee Sujee Maniyam
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: