Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Client
    • Labels:
      None

      Description

      Internally we have a little tool that will do a rough estimate of how many rows there are in a dataHbase. It keeps getting larger and larger partitions running scanners until it turns up > N occupied rows. Once it has a number > N, it multiples by the partition size to get an approximate row count.

      This issue is about generalizing this feature so it could sit in the general hbase install. It would look something like:

      long getApproximateRowCount(final Text startRow, final Text endRow, final long minimumCountPerPartition, final long maximumPartitionSize)
      

      Larger minimumCountPerPartition and maximumPartitionSize values would make the count more accurate but would mean the method ran longer.

      1. 2291_v01.patch
        9 kB
        Edward J. Yoon
      2. Keying.java
        5 kB
        stack

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          Other issues cover this as mentioned above. Let's retire this golden oldie.

          Show
          Andrew Purtell added a comment - Other issues cover this as mentioned above. Let's retire this golden oldie.
          Hide
          Andrew Purtell added a comment -

          Probably this is more apropos for HBASE-2000 and subtasks.

          Show
          Andrew Purtell added a comment - Probably this is more apropos for HBASE-2000 and subtasks.
          Hide
          stack added a comment -

          @Duane Hey. Thanks for the offer of help. Appreciated. You might want to look at coprocesors – hbase-2000 – and minitables, HBASE-2571, in particular. I believe current thinking is that aggregations, sums, etc., would be done as code loaded as coprocessors.

          Show
          stack added a comment - @Duane Hey. Thanks for the offer of help. Appreciated. You might want to look at coprocesors – hbase-2000 – and minitables, HBASE-2571 , in particular. I believe current thinking is that aggregations, sums, etc., would be done as code loaded as coprocessors.
          Hide
          Duane Moore added a comment -

          Wondering about the state of this issue. Is this a feature that could be generalized into providing other aggregation functions like min(), max(), sum(), etc.? Presumably these aggregators would work during data ingest and be specifiable per column or column-family. We are working with a proprietary NOSQL system currently that provides this functionality and it would be highly desirable in HBase. Would be interested to see if there is any active development towards this end. If not, I can look into providing a possible implementation.

          Show
          Duane Moore added a comment - Wondering about the state of this issue. Is this a feature that could be generalized into providing other aggregation functions like min(), max(), sum(), etc.? Presumably these aggregators would work during data ingest and be specifiable per column or column-family. We are working with a proprietary NOSQL system currently that provides this functionality and it would be highly desirable in HBase. Would be interested to see if there is any active development towards this end. If not, I can look into providing a possible implementation.
          Hide
          stack added a comment -

          I like this. MR job would take optionally table name so could per table rather than all HBase. Split would be on a line in .META. Map would read all store files emitting stats keyed with a table + column family prefix? Reduce would sum per table column family? Postprocess could sum on table basis?

          Could add it to our hbase MR Driver so we had more than just RowCounter.

          Show
          stack added a comment - I like this. MR job would take optionally table name so could per table rather than all HBase. Split would be on a line in .META. Map would read all store files emitting stats keyed with a table + column family prefix? Reduce would sum per table column family? Postprocess could sum on table basis? Could add it to our hbase MR Driver so we had more than just RowCounter.
          Hide
          Jonathan Gray added a comment -

          Could also be interesting to have some kind of MR job that scanned over all storefiles and generated aggregated statistics about all your tables and regions.

          A "quick" row count is certainly useful. A detailed report that could run nightly would be very cool.

          Show
          Jonathan Gray added a comment - Could also be interesting to have some kind of MR job that scanned over all storefiles and generated aggregated statistics about all your tables and regions. A "quick" row count is certainly useful. A detailed report that could run nightly would be very cool.
          Hide
          stack added a comment -

          HFile does some of this already:

          + average key length
          + average value length
          + key count
          + entries in block and meta index
          + last key in file

          These are easy to add if we need more.

          Yeah, we should leverage it if only to read all meta in one region and then extrapolate (as you suggest).

          Show
          stack added a comment - HFile does some of this already: + average key length + average value length + key count + entries in block and meta index + last key in file These are easy to add if we need more. Yeah, we should leverage it if only to read all meta in one region and then extrapolate (as you suggest).
          Hide
          Jonathan Gray added a comment -

          I think we should be storing statistics about StoreFiles in the new HFile meta blocks. To start, this could be things like total row count in the storefile. Eventually could expand that in to a row index.

          These kinds of statistics could be useful all over the place.

          Show
          Jonathan Gray added a comment - I think we should be storing statistics about StoreFiles in the new HFile meta blocks. To start, this could be things like total row count in the storefile. Eventually could expand that in to a row index. These kinds of statistics could be useful all over the place.
          Hide
          stack added a comment -

          Could use hbase-1183 to slice a hundreth off a region, count rows in the 1/100th and then make estimate on table row count

          Show
          stack added a comment - Could use hbase-1183 to slice a hundreth off a region, count rows in the 1/100th and then make estimate on table row count
          Hide
          Andrew Purtell added a comment -

          From: "stack"
          To: hbase-user@hadoop.apache.org
          Andrew Purtell wrote:
          > ..
          > Maybe a map of MapFile to row count estimations can be stored in the
          > FS next to the MapFiles and can be updated appropriately during
          > compactions. Then a client can iterate over the regions of a table,
          > ask the regionservers involved for row count estimations, the
          > regionservers can consult the estimation-map and send the largest
          > count found there for the table plus the largest memcache count for
          > the table, and finally the client can total all of the results.
          >
          I like this idea. Suggest sticking it in the issue. Each store already
          has an accompanying 'meta' file under the sympathetic 'info' dir. Could
          stuff estimates in here. Estimate of rows would also help sizing bloom
          filters when the 'enable-bloomfilters' switch is thrown. We'd have to
          be clear this count an estimate particularly when rows of sparsely
          populated columns.

          St.Ack

          Show
          Andrew Purtell added a comment - From: "stack" To: hbase-user@hadoop.apache.org Andrew Purtell wrote: > .. > Maybe a map of MapFile to row count estimations can be stored in the > FS next to the MapFiles and can be updated appropriately during > compactions. Then a client can iterate over the regions of a table, > ask the regionservers involved for row count estimations, the > regionservers can consult the estimation-map and send the largest > count found there for the table plus the largest memcache count for > the table, and finally the client can total all of the results. > I like this idea. Suggest sticking it in the issue. Each store already has an accompanying 'meta' file under the sympathetic 'info' dir. Could stuff estimates in here. Estimate of rows would also help sizing bloom filters when the 'enable-bloomfilters' switch is thrown. We'd have to be clear this count an estimate particularly when rows of sparsely populated columns. St.Ack
          Hide
          Andrew Purtell added a comment -

          One possible option is to count the entries in the MapFile indexes, multiply that count by whatever hbase.io.index.interval (or the INDEX_INTERVAL HTD attribute) is, consider all of the MapFiles for the columns in a table, and choose the largest value. Do this for all of the table's regions. The result would be a reasonable estimate, but the whole process sounds expensive. Originally I was thinking that the regionservers could do this since they have to read in the MapFile indexes anyway, and also they know the count of rows in memcache, but if regionservers limit the number of in-memory MapFile indexes to avoid OOME as has been discussed, they won't have all of the information on hand.

          Maybe a map of MapFile to row count estimations can be stored in the FS next to the MapFiles and can be updated appropriately during compactions. Then a client can iterate over the regions of a table, ask the regionservers involved for row count estimations, the regionservers can consult the estimation-map and send the largest count found there for the table plus the largest memcache count for the table, and finally the client can total all of the results.

          Show
          Andrew Purtell added a comment - One possible option is to count the entries in the MapFile indexes, multiply that count by whatever hbase.io.index.interval (or the INDEX_INTERVAL HTD attribute) is, consider all of the MapFiles for the columns in a table, and choose the largest value. Do this for all of the table's regions. The result would be a reasonable estimate, but the whole process sounds expensive. Originally I was thinking that the regionservers could do this since they have to read in the MapFile indexes anyway, and also they know the count of rows in memcache, but if regionservers limit the number of in-memory MapFile indexes to avoid OOME as has been discussed, they won't have all of the information on hand. Maybe a map of MapFile to row count estimations can be stored in the FS next to the MapFiles and can be updated appropriately during compactions. Then a client can iterate over the regions of a table, ask the regionservers involved for row count estimations, the regionservers can consult the estimation-map and send the largest count found there for the table plus the largest memcache count for the table, and finally the client can total all of the results.
          Hide
          Jim Kellerman added a comment -

          Minor issue will be addressed post 0.2

          Show
          Jim Kellerman added a comment - Minor issue will be addressed post 0.2
          Hide
          Jim Kellerman added a comment -

          Not committed. Reopening.

          Show
          Jim Kellerman added a comment - Not committed. Reopening.
          Hide
          stack added a comment -

          This was committed to branch and trunk

          Show
          stack added a comment - This was committed to branch and trunk
          Hide
          Bryan Duxbury added a comment -

          I was thinking, that rather than sampling and resampling repeatedly, maybe what you could do is look at the region start keys, figure out what the edit distance between start and end keys is as a proxy for size of the region, and then scan the presumed largest and presumed smallest regions. This would give you a lower and upper bound on your table size. If your selections of smallest and largest regions happened to be bad, ie the counts were inverted, you can always just flip them.

          Show
          Bryan Duxbury added a comment - I was thinking, that rather than sampling and resampling repeatedly, maybe what you could do is look at the region start keys, figure out what the edit distance between start and end keys is as a proxy for size of the region, and then scan the presumed largest and presumed smallest regions. This would give you a lower and upper bound on your table size. If your selections of smallest and largest regions happened to be bad, ie the counts were inverted, you can always just flip them.
          Hide
          Edward J. Yoon added a comment -

          Sorry for a delay.

          It seems difficult to me.
          I need some help.

          Show
          Edward J. Yoon added a comment - Sorry for a delay. It seems difficult to me. I need some help.
          Hide
          Jim Kellerman added a comment -

          What is the status of this issue?

          Show
          Jim Kellerman added a comment - What is the status of this issue?
          Hide
          Edward J. Yoon added a comment -

          Thanks comments, stack.
          Now I test it with HADOOP-2480 (real log data table).
          Please wait.

          Show
          Edward J. Yoon added a comment - Thanks comments, stack. Now I test it with HADOOP-2480 (real log data table). Please wait.
          Hide
          Edward J. Yoon added a comment -

          I tested 1~10 billion rows.
          The error range is too large, Let me see about it.

          Now i'm checking the analyze table and compute/estimate statistics.

          Show
          Edward J. Yoon added a comment - I tested 1~10 billion rows. The error range is too large, Let me see about it. Now i'm checking the analyze table and compute/estimate statistics.
          Hide
          stack added a comment -

          What is state of this issue Edward? Will it not work on billions of rows?

          Other comments on the patch are:

          + We should add a HTable.getTableDescriptor and a HTable.getColumnFamilies?
          + Comments would be helpful. For example would be good to explain why you of a sudden set a variable 'i' equal to 2 and a comment confirming what you are doing finding midkeys over and over again would be helpful (Won't this take a long time on a big table)?
          + If in your search for endKey turns up a null, won't you get a NPE when you convert back from base64?
          + Would suggest that the estimator or an estimator override take as inputs the smallest slice to start with and the largest slice that the estimator should try.

          Show
          stack added a comment - What is state of this issue Edward? Will it not work on billions of rows? Other comments on the patch are: + We should add a HTable.getTableDescriptor and a HTable.getColumnFamilies? + Comments would be helpful. For example would be good to explain why you of a sudden set a variable 'i' equal to 2 and a comment confirming what you are doing finding midkeys over and over again would be helpful (Won't this take a long time on a big table)? + If in your search for endKey turns up a null, won't you get a NPE when you convert back from base64? + Would suggest that the estimator or an estimator override take as inputs the smallest slice to start with and the largest slice that the estimator should try.
          Hide
          Edward J. Yoon added a comment - - edited
          real count estimated count
          676 784
          17576 21058
          456976 501647
          7580124 8120647
          65201750 82648211

          (row count of sample space * sample size) was used for calculate.
          Test result is just fine, but Let me see about it.

          If you have a good idea, please let me know.

          Show
          Edward J. Yoon added a comment - - edited real count estimated count 676 784 17576 21058 456976 501647 7580124 8120647 65201750 82648211 (row count of sample space * sample size) was used for calculate. Test result is just fine, but Let me see about it. If you have a good idea, please let me know.
          Hide
          Edward J. Yoon added a comment -

          I tried estimator. (random alphabet combination 1 million ~ 10 billion rows)
          Row count error range is too large.

          Show
          Edward J. Yoon added a comment - I tried estimator. (random alphabet combination 1 million ~ 10 billion rows) Row count error range is too large.
          Hide
          Edward J. Yoon added a comment -

          long getApproximateRowCount() //HTable.java

          META TABLE   first key                          last key
           |---xxx--x---xx---xxx-------xxxxxxxxxxxxxxx---x---xxxxx---x----x----|
                |             |                                        2^34 tablet's rows
                |          1/1000 th
            counting       -->|
          
                  countNum * 1000
          
          Show
          Edward J. Yoon added a comment - long getApproximateRowCount() //HTable.java META TABLE first key last key |---xxx--x---xx---xxx-------xxxxxxxxxxxxxxx---x---xxxxx---x----x----| | | 2^34 tablet's rows | 1/1000 th counting -->| countNum * 1000
          Hide
          Edward J. Yoon added a comment -

          Ok, I exactly understand.

          Show
          Edward J. Yoon added a comment - Ok, I exactly understand.
          Hide
          stack added a comment -

          Edward: Here is an extract of code used internally for doing key space estimations. It won't compile because its been hacked on to remove references to internal, unrelated packages. It was originally written by Jim a while back and then subsequently mangled by me. It might help you w/ this problem though it has a base64 focus (If you want an opinion, these methods, with your cleanup might belong best in hbase/util/Keying.java).

          Show
          stack added a comment - Edward: Here is an extract of code used internally for doing key space estimations. It won't compile because its been hacked on to remove references to internal, unrelated packages. It was originally written by Jim a while back and then subsequently mangled by me. It might help you w/ this problem though it has a base64 focus (If you want an opinion, these methods, with your cleanup might belong best in hbase/util/Keying.java).
          Hide
          Edward J. Yoon added a comment -

          Ok, i understand.
          I'll trying to make it.

          Show
          Edward J. Yoon added a comment - Ok, i understand. I'll trying to make it.
          Hide
          Edward J. Yoon added a comment -

          This method will be used for aggregate functions in hbase shell.

          Show
          Edward J. Yoon added a comment - This method will be used for aggregate functions in hbase shell.

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development