HBase
  1. HBase
  2. HBASE-7280

TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.94.2
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication

        Activity

        Hide
        Jean-Daniel Cryans added a comment -

        This is "by design", if a source cannot replicate one edit then replication is blocked. Apart from better alerting, what do you think HBase should do?

        Show
        Jean-Daniel Cryans added a comment - This is "by design", if a source cannot replicate one edit then replication is blocked. Apart from better alerting, what do you think HBase should do?
        Hide
        Jieshan Bean added a comment -

        Yes, this is the expected behavior. In current implementation, backup cluster should create the tables by itself.

        Show
        Jieshan Bean added a comment - Yes, this is the expected behavior. In current implementation, backup cluster should create the tables by itself.
        Hide
        Honghua Feng added a comment -

        I can understand the initiative of current design. A master cluster may have multiple tables with REPLICATION_SCOPE=1, but not all peer clusters want to replicate all these tables, current design prevents only replicating selective table(s). In our scenario, I expect peer cluster(sink) can omit the edits for which the table doesn't exist in peer cluster and only apply edits for which the table(s) exist in peer cluster(we really want to replicate). I make a minor change in ReplicationSink.java which just omits edits for non-existing table(s) in peer cluster and the behavior is what we want. Though this change doesn't reduce the needless network bandwidth it's at least doesn't block the normal replication.
        Seems current replication's per-cluster granularity is a bit coarse-grained for many real-world scenarios. In my opinion adding such as table- or columnfamily- list configuration for peer when adding peer is more reasonable.

        Show
        Honghua Feng added a comment - I can understand the initiative of current design. A master cluster may have multiple tables with REPLICATION_SCOPE=1, but not all peer clusters want to replicate all these tables, current design prevents only replicating selective table(s). In our scenario, I expect peer cluster(sink) can omit the edits for which the table doesn't exist in peer cluster and only apply edits for which the table(s) exist in peer cluster(we really want to replicate). I make a minor change in ReplicationSink.java which just omits edits for non-existing table(s) in peer cluster and the behavior is what we want. Though this change doesn't reduce the needless network bandwidth it's at least doesn't block the normal replication. Seems current replication's per-cluster granularity is a bit coarse-grained for many real-world scenarios. In my opinion adding such as table- or columnfamily- list configuration for peer when adding peer is more reasonable.
        Hide
        Jieshan Bean added a comment -

        I agree with your suggesion of adding configuration list for each peer. So we need to maitain this list in Zookeeper for each peer. e.g.
        peer-1 -> table1[fam1, fam2], table2[fam1]
        peer-2 -> table1[fam1]
        So the related properties in table is use-less. right? Hope I understand you correctly.
        But this will make things more difficult.

        Change in ReplicationSink seems simple, but master cluster will send some unneccessary edits to peers.

        Show
        Jieshan Bean added a comment - I agree with your suggesion of adding configuration list for each peer. So we need to maitain this list in Zookeeper for each peer. e.g. peer-1 -> table1 [fam1, fam2] , table2 [fam1] peer-2 -> table1 [fam1] So the related properties in table is use-less. right? Hope I understand you correctly. But this will make things more difficult. Change in ReplicationSink seems simple, but master cluster will send some unneccessary edits to peers.
        Hide
        Honghua Feng added a comment -

        yes, that's what I hope for the finer-grained cluster replication. for such design by default (without any table/cf configuration) peer receives all the edits from master cluster. Since in real-world scenario, we may have a master cluster, and a backup cluster which need to replicate the whole copy of the master cluster and it receives all edits, but at the same time maybe there are some experiment/down-stream clusters which just need a certain table or even some CF of a table from master cluster. by providing table/cf configurable peer we can enable such scenarios.

        ReplicationSource need to parse out the peer's table/cf configuration on creation, and filter the edits while reading the HLog files to determine which edits needs to be shipped to the corresponding peer. Looks like no more change in peer-side (ReplicationSink), right?

        Yes, my current change in ReplicationSink doesn't save the unnecessary edits to peers, but it's enough to unblocks us. A wiser treatment should be in ReplicationSource where we can filter out unnecessary edits before shipping out to peer cluster by checking if the table exists at peer cluster for each edit.

        Show
        Honghua Feng added a comment - yes, that's what I hope for the finer-grained cluster replication. for such design by default (without any table/cf configuration) peer receives all the edits from master cluster. Since in real-world scenario, we may have a master cluster, and a backup cluster which need to replicate the whole copy of the master cluster and it receives all edits, but at the same time maybe there are some experiment/down-stream clusters which just need a certain table or even some CF of a table from master cluster. by providing table/cf configurable peer we can enable such scenarios. ReplicationSource need to parse out the peer's table/cf configuration on creation, and filter the edits while reading the HLog files to determine which edits needs to be shipped to the corresponding peer. Looks like no more change in peer-side (ReplicationSink), right? Yes, my current change in ReplicationSink doesn't save the unnecessary edits to peers, but it's enough to unblocks us. A wiser treatment should be in ReplicationSource where we can filter out unnecessary edits before shipping out to peer cluster by checking if the table exists at peer cluster for each edit.
        Hide
        Jean-Daniel Cryans added a comment -

        To give some background about replication, the reason that REPLICATION_SCOPE is an integer and not a boolean is that it is meant to be used to as a way to encode routing information but this has not been implemented yet.

        Show
        Jean-Daniel Cryans added a comment - To give some background about replication, the reason that REPLICATION_SCOPE is an integer and not a boolean is that it is meant to be used to as a way to encode routing information but this has not been implemented yet.
        Hide
        Honghua Feng added a comment -

        Thanks Jean-Daniel

        But even REPLICATION_SCOPE is implemented, I don't think it's as flexible as adding per-peer table/CF configuration. Let me know if I'm wrong in understanding how REPLICATION_SCOPE is used as routing information: edits in master cluster will be shipped to all peer clusters whose peer_id-s are less_than_or_equal_to the REPLICATION_SCOPE. But what if a newly added peer want to replicate a table/CF with REPLICATION_SCOPE=A and another table/CF with REPLICATION=E, but doesn't want table/CF with REPLICATION_SCOPE=B/C/D (A>B>C>D>E here) ? Interpreting REPLICATION_SCOPE as bit-array and treating each bit as a peer_id has a similar problem. (At least we need to change REPLICATION_SCOPE if the original REPLICATION_SCOPE can't satisfy a later added peer's replication requirement)

        Why REPLICATION_SCOPE isn't a rescue here is because in many cases the master cluster doesn't know exactly which peer cluster will / want to replicate which table/CF from it when it creates tables/CFs. On the contrast, each peer cluster knows exactly which tables/CFs to replicate from the master cluster when it adds itself as peer to the master cluster. By introducing table/CF list configuration when adding peer, we don't bother with figuring out in advance which(how many) peers can replicate the table/CF when creating them in master cluster, and we don't need to change the REPLICATION_SCOPE later on. ReplicationSourceManager just listens on the peer ZK nodes and adds a new ReplicationSource for the new peer with configured table/CF list, reads/filters/ships edits of the configured tables/CFs to the corresponding peer.

        ReplicationSource also needs to listen on its peer ZK node for table/CF configuration change, which in turn influence which edits to ship to the peer from then on.

        Any opinion?

        Show
        Honghua Feng added a comment - Thanks Jean-Daniel But even REPLICATION_SCOPE is implemented, I don't think it's as flexible as adding per-peer table/CF configuration. Let me know if I'm wrong in understanding how REPLICATION_SCOPE is used as routing information: edits in master cluster will be shipped to all peer clusters whose peer_id-s are less_than_or_equal_to the REPLICATION_SCOPE. But what if a newly added peer want to replicate a table/CF with REPLICATION_SCOPE=A and another table/CF with REPLICATION=E, but doesn't want table/CF with REPLICATION_SCOPE=B/C/D (A>B>C>D>E here) ? Interpreting REPLICATION_SCOPE as bit-array and treating each bit as a peer_id has a similar problem. (At least we need to change REPLICATION_SCOPE if the original REPLICATION_SCOPE can't satisfy a later added peer's replication requirement) Why REPLICATION_SCOPE isn't a rescue here is because in many cases the master cluster doesn't know exactly which peer cluster will / want to replicate which table/CF from it when it creates tables/CFs. On the contrast, each peer cluster knows exactly which tables/CFs to replicate from the master cluster when it adds itself as peer to the master cluster. By introducing table/CF list configuration when adding peer, we don't bother with figuring out in advance which(how many) peers can replicate the table/CF when creating them in master cluster, and we don't need to change the REPLICATION_SCOPE later on. ReplicationSourceManager just listens on the peer ZK nodes and adds a new ReplicationSource for the new peer with configured table/CF list, reads/filters/ships edits of the configured tables/CFs to the corresponding peer. ReplicationSource also needs to listen on its peer ZK node for table/CF configuration change, which in turn influence which edits to ship to the peer from then on. Any opinion?
        Hide
        Lars Hofhansl added a comment -

        Closing as "Won't fix". This is working as designed.
        We can of course discuss another approach that can also ship meta edits.

        Show
        Lars Hofhansl added a comment - Closing as "Won't fix". This is working as designed. We can of course discuss another approach that can also ship meta edits.
        Hide
        Honghua Feng added a comment -

        Jieshan Bean You can refer to HBASE-8751 for per-peer cf/table granularity replication

        Show
        Honghua Feng added a comment - Jieshan Bean You can refer to HBASE-8751 for per-peer cf/table granularity replication

          People

          • Assignee:
            Unassigned
            Reporter:
            Honghua Feng
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 0.5h
              0.5h
              Remaining:
              Remaining Estimate - 0.5h
              0.5h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development