Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0
    • Component/s: Coprocessors
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

      BigTable Coprocessors (New Since OSDI'06)

      • Arbitrary code that runs run next to each tablet in table
        • As tablets split and move, coprocessor code automatically splits/moves too
      • High-level call interface for clients
        • Unlike RPC, calls addressed to rows or ranges of rows
      • coprocessor client library resolves to actual locations
        • Calls across multiple rows automatically split into multiple parallelized RPCs
      • Very flexible model for building distributed services
        • Automatic scaling, load balancing, request routing for apps

      Example Coprocessor Uses

      • Scalable metadata management for Colossus (next gen GFS-like file system)
      • Distributed language model serving for machine translation system
      • Distributed query processing for full-text indexing support
      • Regular expression search support for code repository

      For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

      Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              6 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development