HBase
  1. HBase
  2. HBASE-2804

[replication] Support ICVs in a master-master setup

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      Currently an ICV ends up as a Put in the HLogs, which ReplicationSource ships to ReplicationSink that in turn only recreates the Put and not the ICV itself. This means that in a master-master replication setup where the same counters are implemented on both side, the Puts will actually overwrite each other.

      We need to find a way to support this use case.

        Issue Links

          Activity

          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          Hide
          Jean-Daniel Cryans added a comment -

          Punting to 0.92.0

          Show
          Jean-Daniel Cryans added a comment - Punting to 0.92.0
          Hide
          Jonathan Gray added a comment -

          +1 on punting to 0.92

          Show
          Jonathan Gray added a comment - +1 on punting to 0.92
          Hide
          stack added a comment -

          This is going to be done in the 0.90.0 timeframe? Otherwise, lets move it out?

          Show
          stack added a comment - This is going to be done in the 0.90.0 timeframe? Otherwise, lets move it out?
          Hide
          Jonathan Gray added a comment -

          you can pay the cost at write-time by executing double writes to each DC, or you can pay the cost at read-time by executing double reads from each DC.

          i think it's worth at least trying to get some "eventually consistent" master-master counters via async replication. is the 'starting point' any more complex than normal replication? i guess if you are considering duplicate writes as idempotent (they kind of are but not completely) it's different but if not it doesn't seem any more difficult, the ordering of operations does not matter in this case only that the operations are strictly not idempotent.

          Show
          Jonathan Gray added a comment - you can pay the cost at write-time by executing double writes to each DC, or you can pay the cost at read-time by executing double reads from each DC. i think it's worth at least trying to get some "eventually consistent" master-master counters via async replication. is the 'starting point' any more complex than normal replication? i guess if you are considering duplicate writes as idempotent (they kind of are but not completely) it's different but if not it doesn't seem any more difficult, the ordering of operations does not matter in this case only that the operations are strictly not idempotent.
          Hide
          ryan rawson added a comment -

          it would make sense to 'shard' ICV by datacenters, where each datacenter gets it's own ICV column then anyone wishing to know the total would just get all columns and sum. Different datacenters would not overwrite each other. The only problem is this is more of an application level thing, and isnt baked into the API anywhere setting people up for failure down the road.

          The problem with doing something like shipping deltas is that it becomes difficult to bring up a new cluster, since the cluster will need a 'starting point' combined with a sequence of deltas that must mesh perfectly or else the replica cluster will be out of sync.

          Show
          ryan rawson added a comment - it would make sense to 'shard' ICV by datacenters, where each datacenter gets it's own ICV column then anyone wishing to know the total would just get all columns and sum. Different datacenters would not overwrite each other. The only problem is this is more of an application level thing, and isnt baked into the API anywhere setting people up for failure down the road. The problem with doing something like shipping deltas is that it becomes difficult to bring up a new cluster, since the cluster will need a 'starting point' combined with a sequence of deltas that must mesh perfectly or else the replica cluster will be out of sync.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development