HBase
  1. HBase
  2. HBASE-4435

Add Group By functionality using Coprocessors

    Details

      Description

      Adds in a Group By -like functionality to HBase, using the Coprocessor framework.

      It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column.

      To use, I've provided two implementations.

      1. In the first, you specify a single group-by column and a stats field:

      statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);

      The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.

      2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of

      {column family, qualifier}

      pairs.

      statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter);

      The GroupByStatsValues code is adapted from the Solr Stats component.

      1. HBASE-4435-v2.patch
        52 kB
        Aaron Tokhy
      2. HBase-4435.patch
        29 kB
        Nichole Treadway

        Issue Links

          Activity

          Nichole Treadway created issue -
          Nichole Treadway made changes -
          Field Original Value New Value
          Attachment HBase-4435.patch [ 12495076 ]
          Nichole Treadway made changes -
          Attachment HBase-4435.patch [ 12495084 ]
          Nichole Treadway made changes -
          Attachment HBase-4435.patch [ 12495076 ]
          Nichole Treadway made changes -
          Description Adds in a Group By -like fucntionality to HBase using coprocessors Adds in a Group By -like functionality to HBase, using the Coprocessor framework.

          It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column.

          To use, I've provided two implementations.

          1. In the first, you specify a single group-by column and a stats field:

                statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);

          The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.

          2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs.

                statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter);

          Nichole Treadway made changes -
          Description Adds in a Group By -like functionality to HBase, using the Coprocessor framework.

          It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column.

          To use, I've provided two implementations.

          1. In the first, you specify a single group-by column and a stats field:

                statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);

          The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.

          2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs.

                statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter);

          Adds in a Group By -like functionality to HBase, using the Coprocessor framework.

          It provides the ability to group the result set on one or more columns (groupBy families). It computes statistics (max, min, sum, count, sum of squares, number missing) for a second column, called the stats column.

          To use, I've provided two implementations.

          1. In the first, you specify a single group-by column and a stats field:

                statsMap = gbc.getStats(tableName, scan, groupByFamily, groupByQualifier, statsFamily, statsQualifier, statsFieldColumnInterpreter);

          The result is a map with the Group By column value (as a String) to a GroupByStatsValues object. The GroupByStatsValues object has max,min,sum etc. of the stats column for that group.

          2. The second implementation allows you to specify a list of group-by columns and a stats field. The List of group-by columns is expected to contain lists of {column family, qualifier} pairs.

                statsMap = gbc.getStats(tableName, scan, listOfGroupByColumns, statsFamily, statsQualifier, statsFieldColumnInterpreter);


          The GroupByStatsValues code is adapted from the Solr Stats component.

          Jeff Hammerbacher made changes -
          Link This issue relates to HBASE-1512 [ HBASE-1512 ]
          Aaron Tokhy made changes -
          Attachment HBASE-4435-v2.patch [ 12549553 ]
          Aaron Tokhy made changes -
          Labels by coprocessors group hbase
          Andrew Purtell made changes -
          Link This issue relates to HBASE-7474 [ HBASE-7474 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Nichole Treadway
            • Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development