HBase
  1. HBase
  2. HBASE-2808

Document the implementation of replication

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: documentation
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From HBASE-2223, we need to provide an overview of how replication was implemented. For example:

      • How ZK is used
      • What are the general flows
      • How failover works
      1. HBASE-2808.patch
        20 kB
        Jean-Daniel Cryans
      2. replication_overview.png
        203 kB
        Jean-Daniel Cryans
      3. HBASE-2808-v2.patch
        20 kB
        Jean-Daniel Cryans
      4. HBASE-2808-v3.patch
        28 kB
        Jean-Daniel Cryans

        Issue Links

          Activity

          Jean-Daniel Cryans made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Jean-Daniel Cryans added a comment -

          Committed to trunk, I added a small FAQ with Stack's questions.

          Show
          Jean-Daniel Cryans added a comment - Committed to trunk, I added a small FAQ with Stack's questions.
          Hide
          stack added a comment -

          Ok. Go ahead and commit then I'd say. (You added note on why clusters need to be in-sync timewise?) Let me know if I can help w/ the image inclusion.

          Show
          stack added a comment - Ok. Go ahead and commit then I'd say. (You added note on why clusters need to be in-sync timewise?) Let me know if I can help w/ the image inclusion.
          Jean-Daniel Cryans made changes -
          Attachment HBASE-2808-v3.patch [ 12449419 ]
          Hide
          Jean-Daniel Cryans added a comment -

          Patch v3 addresses most comments and refreshes the package.html doc. Few other issues explained here:

          GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later?

          Much much later, there's a jira for that IIRC.

          You need a bulk edit shipper? Something that allows you transfer 64MB of edits in one go

          CopyTable has that intention, else by default replication ships max 64MB of data.

          Is it a mistake that WALEdit doesn't carry Put and Delete objects, that we have to reinstantiate not only replicating but when replaying edits?

          AFAIK there's an issue or two for that.

          Why? ain't the ts in the edit?

          Suppose an edit to cell X happens in a EST cluster, then 2 minutes later a new edits happens to the same cell in a PST cluster and that both clusters are in a master-master replication. The second edit is considered younger, so the first will always hide it while in fact the second is older.

          Show
          Jean-Daniel Cryans added a comment - Patch v3 addresses most comments and refreshes the package.html doc. Few other issues explained here: GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Much much later, there's a jira for that IIRC. You need a bulk edit shipper? Something that allows you transfer 64MB of edits in one go CopyTable has that intention, else by default replication ships max 64MB of data. Is it a mistake that WALEdit doesn't carry Put and Delete objects, that we have to reinstantiate not only replicating but when replaying edits? AFAIK there's an issue or two for that. Why? ain't the ts in the edit? Suppose an edit to cell X happens in a EST cluster, then 2 minutes later a new edits happens to the same cell in a PST cluster and that both clusters are in a master-master replication. The second edit is considered younger, so the first will always hide it while in fact the second is older.
          Hide
          stack added a comment -

          Oh, one other thing, lets not use xdoc going forward. It was dumbass the day it was written and going forward, is second class in maven w/ the apt format taking precedence. We'll figure this better soon but goal is a format for articles that can easily be dumped into the hbase 'book' going forward.

          Show
          stack added a comment - Oh, one other thing, lets not use xdoc going forward. It was dumbass the day it was written and going forward, is second class in maven w/ the apt format taking precedence. We'll figure this better soon but goal is a format for articles that can easily be dumped into the hbase 'book' going forward.
          Hide
          stack added a comment -

          Great doc. Here are some comments:

          Awkward phrasing -> ".. and can contribute to enable high availability"
          Not really sure what you are trying to say so no suggested alternative
          
          Is this master cluster or hbase master -> "replication is master-push;..."
          
          I don't get this bit ->
          
          "it is much easier to keep track of what's currently being replicated since
          +        each region server has its own write-ahead-log (aka WAL or HLog), compared
          +        to other well known solutions like MySQL master/slave replication where
          +        there's only one bin log to keep track of.
          "
          
          Whats easier?  I'd think mysql is easier?
          
          
          Leave out the 'that' in the following: "and that rows inserted"
          
          Can you cite a link for this  -> "MySQL's statement-based replication" that expliains mysql statement-based replication?
          
          Something missing here.... "Put and Delete) are replicated in order maintain atomicity."
          
          Say who does the transform... "The key values are transformed into a WALEdit which is.."
          
          Say more what this means -> " (that is, that are part +          of a family scoped GLOBAL and non-catalog).
          "
          
          Reassure reader that this is being done for them by the server, they have to do nothing but config.
          
          Synchronously, the region server that receives the edits reads them
          +          sequentially and applies them on its local cluster using a pool of
          +          HTables. If consecutive rows belong to the same table, they are
          +          inserted together in order to leverage parallel insertions.
          Explain that RS is running a client and may be inserting across its cluster when you say this ->  "
          
          
           Logs that are archived will update their paths in the
          +          in-memory queue of the replicating thread.
          "I tihnk you need to explain archiving.... else  confusion -> "
          
          What is a "cluster key"?
          
          What is "available sinks"?
          
          
          For example, if a slave
          +          cluster has 150 machines, 15 will be chosen as potential sinks for this
          +          master RS.
          You mean 'hosts for clients running in the slave cluster' when you say following?  "
          
          In above I think you have to say master cluster RS rather than just 'master RS' because master usually refers to something else in our parlance.
          
          Since this is done by all master RSs, the probability that
          +          all slave RSs are used is very high, and this method works for clusters
          +          of any size.
          I don't follow .... is this for case where many master clusters replicating into a single slave?  "
          
          
          SHould be 'these' instead of 'those' in following "and each of those contain "
          
          You could expand... saying that if multiple slave clusters, then we'll have a znode per when you say "znode per peer cluster"
          
          Each of those queues will track the HLogs created
          +          by that RS, but they can differ in size. For example, if one slave
          +          cluster becomes unavailable for some time then the HLogs cannot be,
          +          thus they need to stay in the queue (while the others are processed).
          "This needs fixup... hard to figure what is being said: "
          
          Whats this ->  "slave cluster's znode just before it's made available."
          
          Does this mean we lose edits?  "
          The queue items are deleted when the replication thread cannot read
          +          more entries from a file and that there are other files in the queue.
          "
          
          Whats this mean?  "
          or because there's
          +          too many of them)
          
          When would we archive because too many?
          
          "it will notify the source threads that the path
          +          for that log changed. 
          Why does it have to notify that log dir has changed?  If not in one location, can't we check archive area w/o requiring notification?  
          
          "
          
          
          GLOBAL means replicate?  Any provision to replicate only to cluster X and not to cluster Y?  or is that for later?
          
          Explain catalog table
          
          You need a bulk edit shipper?  Something that allows you transfer 64MB of edits in one go?
          
          Is it a mistake that WALEdit doesn't carry Put and Delete objects, that we have to reinstantiate not only replicating but when replaying edits?  Should we make an issue to fix?
          
          Say what these could be?  
          Note that if the master and slave cluster don't have the same
          +          time, time-related issues may occur.
          "
          
          Why?  ain't the ts in the edit?
          
          Show
          stack added a comment - Great doc. Here are some comments: Awkward phrasing -> ".. and can contribute to enable high availability" Not really sure what you are trying to say so no suggested alternative Is this master cluster or hbase master -> "replication is master-push;..." I don't get this bit -> "it is much easier to keep track of what's currently being replicated since + each region server has its own write-ahead-log (aka WAL or HLog), compared + to other well known solutions like MySQL master/slave replication where + there's only one bin log to keep track of. " Whats easier? I'd think mysql is easier? Leave out the 'that' in the following: "and that rows inserted" Can you cite a link for this -> "MySQL's statement-based replication" that expliains mysql statement-based replication? Something missing here.... "Put and Delete) are replicated in order maintain atomicity." Say who does the transform... "The key values are transformed into a WALEdit which is.." Say more what this means -> " (that is, that are part + of a family scoped GLOBAL and non-catalog). " Reassure reader that this is being done for them by the server, they have to do nothing but config. Synchronously, the region server that receives the edits reads them + sequentially and applies them on its local cluster using a pool of + HTables. If consecutive rows belong to the same table, they are + inserted together in order to leverage parallel insertions. Explain that RS is running a client and may be inserting across its cluster when you say this -> " Logs that are archived will update their paths in the + in-memory queue of the replicating thread. "I tihnk you need to explain archiving.... else confusion -> " What is a "cluster key" ? What is "available sinks" ? For example, if a slave + cluster has 150 machines, 15 will be chosen as potential sinks for this + master RS. You mean 'hosts for clients running in the slave cluster' when you say following? " In above I think you have to say master cluster RS rather than just 'master RS' because master usually refers to something else in our parlance. Since this is done by all master RSs, the probability that + all slave RSs are used is very high, and this method works for clusters + of any size. I don't follow .... is this for case where many master clusters replicating into a single slave? " SHould be 'these' instead of 'those' in following "and each of those contain " You could expand... saying that if multiple slave clusters, then we'll have a znode per when you say "znode per peer cluster" Each of those queues will track the HLogs created + by that RS, but they can differ in size. For example, if one slave + cluster becomes unavailable for some time then the HLogs cannot be, + thus they need to stay in the queue ( while the others are processed). "This needs fixup... hard to figure what is being said: " Whats this -> "slave cluster's znode just before it's made available." Does this mean we lose edits? " The queue items are deleted when the replication thread cannot read + more entries from a file and that there are other files in the queue. " Whats this mean? " or because there's + too many of them) When would we archive because too many? "it will notify the source threads that the path + for that log changed. Why does it have to notify that log dir has changed? If not in one location, can't we check archive area w/o requiring notification? " GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Explain catalog table You need a bulk edit shipper? Something that allows you transfer 64MB of edits in one go? Is it a mistake that WALEdit doesn't carry Put and Delete objects, that we have to reinstantiate not only replicating but when replaying edits? Should we make an issue to fix? Say what these could be? Note that if the master and slave cluster don't have the same + time, time-related issues may occur. " Why? ain't the ts in the edit?
          Jean-Daniel Cryans made changes -
          Attachment HBASE-2808-v2.patch [ 12449141 ]
          Hide
          Jean-Daniel Cryans added a comment -

          Fixed grammatical errors and such.

          Show
          Jean-Daniel Cryans added a comment - Fixed grammatical errors and such.
          Jean-Daniel Cryans made changes -
          Attachment HBASE-2808.patch [ 12449025 ]
          Attachment replication_overview.png [ 12449026 ]
          Hide
          Jean-Daniel Cryans added a comment -

          First draft of the documentation, the page refers to the attached image.

          Show
          Jean-Daniel Cryans added a comment - First draft of the documentation, the page refers to the attached image.
          Jean-Daniel Cryans made changes -
          Field Original Value New Value
          Link This issue relates to HBASE-2223 [ HBASE-2223 ]
          Jean-Daniel Cryans created issue -

            People

            • Assignee:
              Jean-Daniel Cryans
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development