HBase
  1. HBase
  2. HBASE-10295

Refactor the replication implementation to eliminate permanent zk node

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:

      Description

      Though this is a broader and bigger change, it original motivation derives from HBASE-8751: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this correct implementation theme.

        Issue Links

          Activity

          stack made changes -
          Fix Version/s 0.99.2 [ 12328822 ]
          Hide
          stack added a comment -

          Moving out of 1.0. No progress. Move back if I have it wrong.

          Show
          stack added a comment - Moving out of 1.0. No progress. Move back if I have it wrong.
          Enis Soztutar made changes -
          Fix Version/s 0.99.2 [ 12328822 ]
          Fix Version/s 0.99.1 [ 12328551 ]
          Hide
          stack added a comment -

          Mikhail Antonov This one sounds up you fellas' alley? Let me backport the one the undoes table enable/disable.

          Show
          stack added a comment - Mikhail Antonov This one sounds up you fellas' alley? Let me backport the one the undoes table enable/disable.
          Mikhail Antonov made changes -
          Description Though this is a broader and bigger change, it original motivation derives from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme. Though this is a broader and bigger change, it original motivation derives from HBASE-8751: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme.
          Enis Soztutar made changes -
          Fix Version/s 0.99.1 [ 12328551 ]
          Fix Version/s 0.99.0 [ 12325675 ]
          Andrew Purtell made changes -
          Labels noob beginner
          stack made changes -
          Labels noob
          stack made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          Andrew Purtell made changes -
          Link This issue relates to HBASE-11629 [ HBASE-11629 ]
          Hide
          Enis Soztutar added a comment -

          Over on HBASE-9864, there is something a little simpler that came of a chat w/ my man Matteo.

          Agreed, we should continue this over at HBASE-9864. Let me carry my writing there as well.

          Show
          Enis Soztutar added a comment - Over on HBASE-9864 , there is something a little simpler that came of a chat w/ my man Matteo. Agreed, we should continue this over at HBASE-9864 . Let me carry my writing there as well.
          Hide
          stack added a comment -

          Enis Soztutar Over on HBASE-9864, there is something a little simpler that came of a chat w/ my man Matteo. I like the idea of doing the zk model but your short paragraph needs a bit of expansion so I understand better how you think it would work in our context. Thanks.

          Show
          stack added a comment - Enis Soztutar Over on HBASE-9864 , there is something a little simpler that came of a chat w/ my man Matteo. I like the idea of doing the zk model but your short paragraph needs a bit of expansion so I understand better how you think it would work in our context. Thanks.
          Hide
          Honghua Feng added a comment -

          Honghua Feng stack Lars Hofhansl is that ok if I move it there as subtask?

          Sounds good to me

          Show
          Honghua Feng added a comment - Honghua Feng stack Lars Hofhansl is that ok if I move it there as subtask? Sounds good to me
          Hide
          Enis Soztutar added a comment -

          Make Master arbiter for these new system tables – only the master can mod them – and then add a response on the heartbeat to update regionservers on last edit? Could be as simple as master just replying w/ timestamp of last edit.

          We should do this as a part of (or using) HBASE-9864. I was thinking of something similar, where the data is kept in an hbase table as a snapshot + WAL. All transactions will have an trxid (NO timestamps please). All region servers open a session with a lease, and keep heartbeats to renew their lease. They send the last seen trxId, and the coordinator replies with the list of edits that they should apply to their in memory cache. If some reader looses it's leases, the coordinator (master) invalidates its session (so that there is an upper bound on the time the edits will be propogated). The coordinator keeps the last seen trxId per session, so that it can do recreate the snapshot and get rid of write ahead log entries.

          However, astute readers might have noticed that this is indeed similar to zk's own protocol except that the data is not replicated via ZAB, but via datanode pipelines and hbase.

          Show
          Enis Soztutar added a comment - Make Master arbiter for these new system tables – only the master can mod them – and then add a response on the heartbeat to update regionservers on last edit? Could be as simple as master just replying w/ timestamp of last edit. We should do this as a part of (or using) HBASE-9864 . I was thinking of something similar, where the data is kept in an hbase table as a snapshot + WAL. All transactions will have an trxid (NO timestamps please). All region servers open a session with a lease, and keep heartbeats to renew their lease. They send the last seen trxId, and the coordinator replies with the list of edits that they should apply to their in memory cache. If some reader looses it's leases, the coordinator (master) invalidates its session (so that there is an upper bound on the time the edits will be propogated). The coordinator keeps the last seen trxId per session, so that it can do recreate the snapshot and get rid of write ahead log entries. However, astute readers might have noticed that this is indeed similar to zk's own protocol except that the data is not replicated via ZAB, but via datanode pipelines and hbase.
          Enis Soztutar made changes -
          Link This issue relates to HBASE-9864 [ HBASE-9864 ]
          Hide
          Mikhail Antonov added a comment -

          Honghua Feng stack Lars Hofhansl is that ok if I move it there as subtask?

          Show
          Mikhail Antonov added a comment - Honghua Feng stack Lars Hofhansl is that ok if I move it there as subtask?
          Hide
          Andrew Purtell added a comment -

          Makes sense.

          Show
          Andrew Purtell added a comment - Makes sense.
          Hide
          Mikhail Antonov added a comment -

          Just thinking it may be good to have all jiras which are about "eliminating..something permanent in ZK" under the same umbrella.

          Show
          Mikhail Antonov added a comment - Just thinking it may be good to have all jiras which are about "eliminating..something permanent in ZK" under the same umbrella.
          Hide
          Mikhail Antonov added a comment -

          I'm thinking if this jira kind of fits the umbrella of HBASE-10909?

          Show
          Mikhail Antonov added a comment - I'm thinking if this jira kind of fits the umbrella of HBASE-10909 ?
          Cosmin Lehene made changes -
          Link This issue relates to HBASE-10296 [ HBASE-10296 ]
          Hide
          Honghua Feng added a comment -

          a long-term item and need more discussion and thought, a bit related to HBASE-10296

          Show
          Honghua Feng added a comment - a long-term item and need more discussion and thought, a bit related to HBASE-10296
          Honghua Feng made changes -
          Assignee Feng Honghua [ fenghh ]
          Hide
          Andrew Purtell added a comment -

          Nice idea, +1

          Show
          Andrew Purtell added a comment - Nice idea, +1
          Hide
          Honghua Feng added a comment -

          stack Yes, sound feasible, thanks

          Show
          Honghua Feng added a comment - stack Yes, sound feasible, thanks
          Hide
          stack added a comment -

          Make Master arbiter for these new system tables – only the master can mod them – and then add a response on the heartbeat to update regionservers on last edit? Currently we return a void. See RegionServerReportResponse in http://svn.apache.org/viewvc/hbase/trunk/hbase-protocol/src/main/protobuf/RegionServerStatus.proto?view=markup Could be as simple as master just replying w/ timestamp of last edit. If RS has not seen the new edit, it goes and reads the table....

          Show
          stack added a comment - Make Master arbiter for these new system tables – only the master can mod them – and then add a response on the heartbeat to update regionservers on last edit? Currently we return a void. See RegionServerReportResponse in http://svn.apache.org/viewvc/hbase/trunk/hbase-protocol/src/main/protobuf/RegionServerStatus.proto?view=markup Could be as simple as master just replying w/ timestamp of last edit. If RS has not seen the new edit, it goes and reads the table....
          Hide
          Honghua Feng added a comment -

          Another HBase internal system table (similar to meta table) is a good choice for storing replication zk node information, but lacking the zk's inherent watch/notification mechanism which is essential for the 'client change replication status(such as peer-state / add peer), regionservers listens and gets notification for such status and perform accordingly(such as disable a peer / add a new peer thread)' communication pattern...

          No clear plan for now, but I'll spend some time these days to figure out a draft design for discussion and brainstorming first, any opinion/suggestion is welcome

          And change the title of this jira to make it more general and match its intention better. santosh banerjee

          Show
          Honghua Feng added a comment - Another HBase internal system table (similar to meta table) is a good choice for storing replication zk node information, but lacking the zk's inherent watch/notification mechanism which is essential for the 'client change replication status(such as peer-state / add peer), regionservers listens and gets notification for such status and perform accordingly(such as disable a peer / add a new peer thread)' communication pattern... No clear plan for now, but I'll spend some time these days to figure out a draft design for discussion and brainstorming first, any opinion/suggestion is welcome And change the title of this jira to make it more general and match its intention better. santosh banerjee
          Honghua Feng made changes -
          Description Though this is a more broader and bigger change, it original motivation derives from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme. Though this is a broader and bigger change, it original motivation derives from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme.
          Honghua Feng made changes -
          Summary Refactor the implementation of replication to eliminate permanent zk node Refactor the replication implementation to eliminate permanent zk node
          Honghua Feng made changes -
          Summary Refactor the implementation of replication peer to eliminate permanent peer-state ZKNode Refactor the implementation of replication to eliminate permanent zk node
          Description Now the peer-state sub-node under peer is a permanent one, let's refactor to eliminate the permanent ZKNode. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme. Though this is a more broader and bigger change, it original motivation derives from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme.
          Hide
          Lars Hofhansl added a comment -

          Awesome. Do you have a plan? I.e. where should we put this state instead. Maybe an HBase table?
          Let's target this to trunk only for now.

          Show
          Lars Hofhansl added a comment - Awesome. Do you have a plan? I.e. where should we put this state instead. Maybe an HBase table? Let's target this to trunk only for now.
          Lars Hofhansl made changes -
          Field Original Value New Value
          Fix Version/s 0.99.0 [ 12325675 ]
          Honghua Feng created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Honghua Feng
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:

                Development