Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-5315

Cross cluster replication of the base table only should be sufficient

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When replicating Phoenix tables using the HBase cross cluster replication facility, it should be sufficient (and must, for correctness and avoidance of race conditions and inconsistencies) to replicate the base table only. On the sink cluster the replication client's application of mutations from the replication stream to the local base table should trigger all necessary index update operations. To the extent that won't happen now due to implementation details, those details should be reworked.

      This also has important efficiency benefits: no matter how many indexes are defined for a base table, only the base table updates need be replicated (presuming Phoenix schema is synchronized over all sites by some other external means).

      This would likely constitute multiple components, so we should use this issue as an umbrella. We'd need:

      1. A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL like a normal replication endpoint. However, rather than writing to HBase's replication sink APIs (which create HBase RPCs to a remote cluster), they should write to a new Phoenix Endpoint coprocessor.
      2. An HBase coprocessor Endpoint hook that takes in a request from a remote cluster (containing both the WALEdit's data and the WALKey's annotated metadata telling the remote cluster what tenant_id, logical tablename, and timestamp the data is associated with). Ideally the API's message format should be configurable, and could be either a protobuf or an Avro schema similar to the one described by PHOENIX-5443. The endpoint hook would take the metadata + data and regenerate a complete set of Phoenix mutations, both data and indexes, just as the phoenix client did for the original SQL statement that generated the source-side edits. These mutations would be written to the remote cluster by the normal Phoenix write path. 

      (Unfortunately, HBase uses the term "endpoint" to mean both a replication plugin, AND a stored-procedure-like coprocessor hook. To be clear, 1 is a replication plugin, 2 is a coprocessor hook)

       

      Attachments

        Issue Links

          Activity

            People

              bharathv Bharath Vissapragada
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: