Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0
    • Component/s: Coprocessors
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

      BigTable Coprocessors (New Since OSDI'06)

      • Arbitrary code that runs run next to each tablet in table
        • As tablets split and move, coprocessor code automatically splits/moves too
      • High-level call interface for clients
        • Unlike RPC, calls addressed to rows or ranges of rows
      • coprocessor client library resolves to actual locations
        • Calls across multiple rows automatically split into multiple parallelized RPCs
      • Very flexible model for building distributed services
        • Automatic scaling, load balancing, request routing for apps

      Example Coprocessor Uses

      • Scalable metadata management for Colossus (next gen GFS-like file system)
      • Distributed language model serving for machine translation system
      • Distributed query processing for full-text indexing support
      • Regular expression search support for code repository

      For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

      Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.

        Issue Links

          Activity

          Andrew Purtell created issue -
          Andrew Purtell made changes -
          Field Original Value New Value
          Link This issue incorporates HBASE-1002 [ HBASE-1002 ]
          Andrew Purtell made changes -
          Link This issue incorporates HBASE-1002 [ HBASE-1002 ]
          Andrew Purtell made changes -
          Link This issue depends upon HBASE-1936 [ HBASE-1936 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-1935 [ HBASE-1935 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-74 [ HBASE-74 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-1845 [ HBASE-1845 ]
          Andrew Purtell made changes -
          Description From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

          BigTable Coprocessors (New Since OSDI'06)

          * Arbitrary code that runs run next to each tablet in table
              - As tablets split and move, coprocessor code automatically splits/moves too

          * High-level call interface for clients
              - Unlike RPC, calls addressed to rows or ranges of rows

          * coprocessor client library resolves to actual locations
              - Calls across multiple rows automatically split into multiple parallelized RPCs

          * Very flexible model for building distributed services
              - Automatic scaling, load balancing, request routing for apps

          Example Coprocessor Uses

          * Scalable metadata management for Colossus (next gen GFS-like file system)

          * Distributed language model serving for machine translation system

          * Distributed query processing for full-text indexing support

          * Regular expression search support for code repository
          From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

          BigTable Coprocessors (New Since OSDI'06)

          * Arbitrary code that runs run next to each tablet in table
              ** As tablets split and move, coprocessor code automatically splits/moves too

          * High-level call interface for clients
              ** Unlike RPC, calls addressed to rows or ranges of rows

          * coprocessor client library resolves to actual locations
              ** Calls across multiple rows automatically split into multiple parallelized RPCs

          * Very flexible model for building distributed services
              ** Automatic scaling, load balancing, request routing for apps

          Example Coprocessor Uses

          * Scalable metadata management for Colossus (next gen GFS-like file system)

          * Distributed language model serving for machine translation system

          * Distributed query processing for full-text indexing support

          * Regular expression search support for code repository
          Andrew Purtell made changes -
          Assignee Andrew Purtell [ apurtell ]
          Hide
          stack added a comment -

          Thats a sweet citation Andrew. Has update on BT. Has stuff on (a few things on) how they do replication.

          Show
          stack added a comment - Thats a sweet citation Andrew. Has update on BT. Has stuff on (a few things on) how they do replication.
          Andrew Purtell made changes -
          Assignee Andrew Purtell [ apurtell ]
          Andrew Purtell made changes -
          Link This issue is blocked by HBASE-2321 [ HBASE-2321 ]
          Andrew Purtell made changes -
          Assignee Andrew Purtell [ apurtell ]
          Fix Version/s 0.21.0 [ 12313607 ]
          Description From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

          BigTable Coprocessors (New Since OSDI'06)

          * Arbitrary code that runs run next to each tablet in table
              ** As tablets split and move, coprocessor code automatically splits/moves too

          * High-level call interface for clients
              ** Unlike RPC, calls addressed to rows or ranges of rows

          * coprocessor client library resolves to actual locations
              ** Calls across multiple rows automatically split into multiple parallelized RPCs

          * Very flexible model for building distributed services
              ** Automatic scaling, load balancing, request routing for apps

          Example Coprocessor Uses

          * Scalable metadata management for Colossus (next gen GFS-like file system)

          * Distributed language model serving for machine translation system

          * Distributed query processing for full-text indexing support

          * Regular expression search support for code repository
          From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

          BigTable Coprocessors (New Since OSDI'06)

          * Arbitrary code that runs run next to each tablet in table
              ** As tablets split and move, coprocessor code automatically splits/moves too

          * High-level call interface for clients
              ** Unlike RPC, calls addressed to rows or ranges of rows

          * coprocessor client library resolves to actual locations
              ** Calls across multiple rows automatically split into multiple parallelized RPCs

          * Very flexible model for building distributed services
              ** Automatic scaling, load balancing, request routing for apps

          Example Coprocessor Uses

          * Scalable metadata management for Colossus (next gen GFS-like file system)

          * Distributed language model serving for machine translation system

          * Distributed query processing for full-text indexing support

          * Regular expression search support for code repository

          For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set {{hbase.regionserver.class}} and {{hbase.regionserver.impl}} in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

          Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.
          Hide
          Karthik K added a comment -

          Would it be ok to have a subtask under this to discuss distributed query processing using co-processors, that hbasene - http://github.com/akkumar/hbasene , intends to use to optimize query processing .

          Show
          Karthik K added a comment - Would it be ok to have a subtask under this to discuss distributed query processing using co-processors, that hbasene - http://github.com/akkumar/hbasene , intends to use to optimize query processing .
          Hide
          Andrew Purtell added a comment -

          @Kay Kay: HBASE-2469, have at it.

          Show
          Andrew Purtell added a comment - @Kay Kay: HBASE-2469 , have at it.
          Hide
          stack added a comment -

          Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.

          Show
          stack added a comment - Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.
          stack made changes -
          Fix Version/s 0.22.0 [ 12314223 ]
          Fix Version/s 0.21.0 [ 12313607 ]
          Duane Moore made changes -
          Link This issue relates to HBASE-32 [ HBASE-32 ]
          Andrew Purtell made changes -
          Link This issue depends upon HBASE-1936 [ HBASE-1936 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-2893 [ HBASE-2893 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-32 [ HBASE-32 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-3340 [ HBASE-3340 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-3341 [ HBASE-3341 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-3342 [ HBASE-3342 ]
          Todd Lipcon made changes -
          Component/s coprocessors [ 12314191 ]
          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          stack made changes -
          Fix Version/s 0.92.0 [ 12314223 ]
          Hide
          stack added a comment -

          Andrew, can we close this issue now?

          Show
          stack added a comment - Andrew, can we close this issue now?
          Hide
          stack added a comment -

          Or maybe we can't because some of the issues remain undone? But maybe enough has been done?

          Show
          stack added a comment - Or maybe we can't because some of the issues remain undone? But maybe enough has been done?
          Andrew Purtell made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Assignee Andrew Purtell [ apurtell ]
          Fix Version/s 0.92.0 [ 12314223 ]
          Resolution Fixed [ 1 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          566d 21h 22m 1 Andrew Purtell 11/Jun/11 15:12

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              6 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development