Bookkeeper
  1. Bookkeeper
  2. BOOKKEEPER-237 Automatic recovery of under-replicated ledgers and its entries
  3. BOOKKEEPER-417

Hierarchical zk underreplication manager should clean up its hierarchy when done to allow for fast acquisition of underreplicated entries

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.0
    • Labels:
      None

      Description

      As we traverse the hierarchy to find a underreplicated fragment, leaving the hierarchy will cause the traversal to search in many empty znodes.

      1. BOOKKEEPER-417.diff
        11 kB
        Ivan Kelly
      2. BOOKKEEPER-417.diff
        10 kB
        Ivan Kelly
      3. BOOKKEEPER-417.diff
        9 kB
        Ivan Kelly

        Activity

        Hide
        Uma Maheswara Rao G added a comment -

        Yeah, I have seen this while debugging. Thanks for filing the JIRA. I think, functionally no harm, but we have to fix this to avoid unnecessary traversals.

        Show
        Uma Maheswara Rao G added a comment - Yeah, I have seen this while debugging. Thanks for filing the JIRA. I think, functionally no harm, but we have to fix this to avoid unnecessary traversals.
        Hide
        Ivan Kelly added a comment -

        Yes, when I was running some benchmarks, this really slowed down recovery time if the number of ledgers being recovered was large (10,000+).

        Show
        Ivan Kelly added a comment - Yes, when I was running some benchmarks, this really slowed down recovery time if the number of ledgers being recovered was large (10,000+).
        Hide
        Ivan Kelly added a comment -

        Also, I already have this implemented, I just need to break out the patch.

        Show
        Ivan Kelly added a comment - Also, I already have this implemented, I just need to break out the patch.
        Hide
        Rakesh R added a comment -

        Thanks Ivan for introducing guava.
        Patch looks nice and just one thought, its safe and good to configure timeout parameter for testHierarchyCleanupInterference as it has many loops and join.

        Show
        Rakesh R added a comment - Thanks Ivan for introducing guava. Patch looks nice and just one thought, its safe and good to configure timeout parameter for testHierarchyCleanupInterference as it has many loops and join.
        Hide
        Rakesh R added a comment -

        I mean like following in tests and timeout could be an ideal value, so will not hang at all.

        @Test(timeout = 50000)
        public void testHierarchyCleanupInterference ()
        
        Show
        Rakesh R added a comment - I mean like following in tests and timeout could be an ideal value, so will not hang at all. @Test(timeout = 50000) public void testHierarchyCleanupInterference ()
        Hide
        Uma Maheswara Rao G added a comment -

        Hi Ivan. Patch looks great. I have one comment adding to rakesh comment.

          zkc.delete(getUrLedgerZnode(ledgerId), l.getLedgerZNodeVersion());
        +                // clean up the hierarchy
        +                String parts[] = getUrLedgerZnode(ledgerId).split("/");
        +                for (int i = 1; i <= 4; i++) {
        +                    String p[] = Arrays.copyOf(parts, parts.length - i);
        +                    String path = Joiner.on("/").join(p);
        +                    Stat s = zkc.exists(path, null);
        +                    if (s != null) {
        +                        zkc.delete(path, s.getVersion());
        +                    }
        +                }
        

        Currently NotEmptyException handle for even leaf node.
        Actually we should handle only for 'clean up the hierarchy' delete call.
        When we delete leaf node, expectation is that, that will be free. But for any reason/due to some bug, if leaf node has some child node introduced, then we will silently ignore here. Actually that is not expected. It should loudly throw exception out.
        So, better to handle NotEmptyException only for hierarchy cleanup delete call. What do you say?

        Show
        Uma Maheswara Rao G added a comment - Hi Ivan. Patch looks great. I have one comment adding to rakesh comment. zkc.delete(getUrLedgerZnode(ledgerId), l.getLedgerZNodeVersion()); + // clean up the hierarchy + String parts[] = getUrLedgerZnode(ledgerId).split( "/" ); + for ( int i = 1; i <= 4; i++) { + String p[] = Arrays.copyOf(parts, parts.length - i); + String path = Joiner.on( "/" ).join(p); + Stat s = zkc.exists(path, null ); + if (s != null ) { + zkc.delete(path, s.getVersion()); + } + } Currently NotEmptyException handle for even leaf node. Actually we should handle only for 'clean up the hierarchy' delete call. When we delete leaf node, expectation is that, that will be free. But for any reason/due to some bug, if leaf node has some child node introduced, then we will silently ignore here. Actually that is not expected. It should loudly throw exception out. So, better to handle NotEmptyException only for hierarchy cleanup delete call. What do you say?
        Hide
        Ivan Kelly added a comment -

        New patch addresses comments

        Show
        Ivan Kelly added a comment - New patch addresses comments
        Hide
        Uma Maheswara Rao G added a comment -

        +1 patch looks good to me. Thanks Ivan.

        Show
        Uma Maheswara Rao G added a comment - +1 patch looks good to me. Thanks Ivan.
        Hide
        Rakesh R added a comment -

        +1 New patch looks good. Thanks Ivan.

        Show
        Rakesh R added a comment - +1 New patch looks good. Thanks Ivan.
        Hide
        Ivan Kelly added a comment -

        Last patch had a race in TestReplicationWorker. Fixed in latest.

        Show
        Ivan Kelly added a comment - Last patch had a race in TestReplicationWorker. Fixed in latest.
        Hide
        Uma Maheswara Rao G added a comment -

        Yeah, +1 for TestReplicationWorker changes. Thanks

        Show
        Uma Maheswara Rao G added a comment - Yeah, +1 for TestReplicationWorker changes. Thanks
        Hide
        Ivan Kelly added a comment -

        Committed as r1398834. Thanks guys.

        Show
        Ivan Kelly added a comment - Committed as r1398834. Thanks guys.
        Hide
        Hudson added a comment -

        Integrated in bookkeeper-trunk #755 (See https://builds.apache.org/job/bookkeeper-trunk/755/)
        BOOKKEEPER-417: Hierarchical zk underreplication manager should clean up its hierarchy when done to allow for fast acquisition of underreplicated entries (ivank) (Revision 1398834)

        Result = UNSTABLE
        ivank :
        Files :

        • /zookeeper/bookkeeper/trunk/CHANGES.txt
        • /zookeeper/bookkeeper/trunk/bookkeeper-server/pom.xml
        • /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/ZkLedgerUnderreplicationManager.java
        • /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/replication/TestLedgerUnderreplicationManager.java
        • /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/replication/TestReplicationWorker.java
        Show
        Hudson added a comment - Integrated in bookkeeper-trunk #755 (See https://builds.apache.org/job/bookkeeper-trunk/755/ ) BOOKKEEPER-417 : Hierarchical zk underreplication manager should clean up its hierarchy when done to allow for fast acquisition of underreplicated entries (ivank) (Revision 1398834) Result = UNSTABLE ivank : Files : /zookeeper/bookkeeper/trunk/CHANGES.txt /zookeeper/bookkeeper/trunk/bookkeeper-server/pom.xml /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/ZkLedgerUnderreplicationManager.java /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/replication/TestLedgerUnderreplicationManager.java /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/replication/TestReplicationWorker.java

          People

          • Assignee:
            Ivan Kelly
            Reporter:
            Ivan Kelly
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development