Bookkeeper
  1. Bookkeeper
  2. BOOKKEEPER-212

Bookie stops responding when creating and deleting many ledgers

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.0
    • Component/s: None
    • Labels:
      None

      Description

      I have written down a short app to try to reproduce one problematic case reported on the user list. The app does the following:

      1. It creates initially a number of ledgers, say 2000;
      2. Once it reaches 2000, for each new ledger it creates, it deletes the one at the head of the list;
      3. Before closing the ledger, it adds 5 entries of 1k, just to generate some traffic for any given ledger.

      What I tried to achieve is to have thousands of active ledgers and delete new ledgers as I create new ones. I'll post a link to my test code later.

      At some point, one bookie stops responding. The bookie seems to be up, but it is not responsive. Looking at the logs, this is what I see:

      2012-04-06 12:22:05,765 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 1726 is evicted from file info cache.
      2012-04-06 12:22:05,769 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1727 is evicted from file info cache.
      2012-04-06 12:22:05,772 - INFO  [SyncThread:LedgerCacheImpl@682] - Ledger 1728 is evicted from file info cache.
      2012-04-06 12:22:05,780 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1729 is evicted from file info cache.
      2012-04-06 12:22:05,787 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1730 is evicted from file info cache.
      2012-04-06 12:22:05,794 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1731 is evicted from file info cache.
      2012-04-06 12:22:05,801 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1732 is evicted from file info cache.
      2012-04-06 12:22:05,807 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1733 is evicted from file info cache.
      2012-04-06 12:22:05,822 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1734 is evicted from file info cache.
      2012-04-06 12:22:05,828 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1735 is evicted from file info cache.
      2012-04-06 12:22:05,842 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1736 is evicted from file info cache.
      2012-04-06 12:22:05,851 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1737 is evicted from file info cache.
      2012-04-06 12:22:05,856 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1738 is evicted from file info cache.
      2012-04-06 12:22:05,864 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1739 is evicted from file info cache.
      2012-04-06 12:22:05,874 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1740 is evicted from file info cache.
      2012-04-06 12:22:05,885 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1741 is evicted from file info cache.
      2012-04-06 12:22:05,894 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1742 is evicted from file info cache.
      2012-04-06 12:22:05,902 - INFO  [NIOServerFactory-3181:LedgerCacheImpl@682] - Ledger 1743 is evicted from file info cache.
      2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] - Ledger 1744 is evicted from file info cache.
      2012-04-06 12:22:05,987 - ERROR [GarbageCollectorThread:GarbageCollectorThread$1@244] - Exception when deleting the ledger index file on the Bookie: 
      java.io.IOException: /home/fpj/bk/current/1/b/10b.idx not found
              at org.apache.bookkeeper.bookie.FileInfo.checkOpen(FileInfo.java:118)
              at org.apache.bookkeeper.bookie.FileInfo.close(FileInfo.java:194) 
              at org.apache.bookkeeper.bookie.LedgerCacheImpl.deleteLedger(LedgerCacheImpl.java:619) 
              at org.apache.bookkeeper.bookie.GarbageCollectorThread$1.gc(GarbageCollectorThread.java:242)
              at org.apache.bookkeeper.meta.AbstractZkLedgerManager.doGc(AbstractZkLedgerManager.java:274)
              at org.apache.bookkeeper.meta.FlatLedgerManager.garbageCollectLedgers(FlatLedgerManager.java:168)
              at org.apache.bookkeeper.bookie.GarbageCollectorThread.doGcLedgers(GarbageCollectorThread.java:237)
              at org.apache.bookkeeper.bookie.GarbageCollectorThread.run(GarbageCollectorThread.java:206) 
      2012-04-06 12:22:05,987 - INFO  [GarbageCollectorThread:LedgerCacheImpl@682] - Ledger 1745 is evicted from file info cache.
      

      There are lots of exceptions like that, but otherwise I don't see anything that could make the bookie unresponsive. I'll upload the logs as well.

      1. bookkeeper-server.log
        596 kB
        Flavio Junqueira
      2. BK-212.diff
        3 kB
        Sijie Guo

        Activity

        Flavio Junqueira created issue -
        Flavio Junqueira made changes -
        Field Original Value New Value
        Fix Version/s 4.1.0 [ 12319145 ]
        Flavio Junqueira made changes -
        Attachment bookkeeper-server.log [ 12521672 ]
        Sijie Guo made changes -
        Attachment BK-212.diff [ 12521920 ]
        Sijie Guo made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Flavio Junqueira made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Assignee Sijie Guo [ hustlmsp ]
        Resolution Fixed [ 1 ]
        Ivan Kelly made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Sijie Guo
            Reporter:
            Flavio Junqueira
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development