HBase
  1. HBase
  2. HBASE-4853

HBASE-4789 does overzealous pruning of seqids

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0, 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs'

      At a minimum, these lines need removing:

      diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
      index 623edbe..a0bbe01 100644
      --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
      +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
      @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
             // Cleaning up of lastSeqWritten is in the finally clause because we
             // don't want to confuse getOldestOutstandingSeqNum()
             this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
      -      Long l = this.lastSeqWritten.remove(encodedRegionName);
      -      if (l != null) {
      -        LOG.warn("Why is there a raw encodedRegionName in lastSeqWritten? name=" +
      -          Bytes.toString(encodedRegionName) + ", seqid=" + l);
      -       }
             this.cacheFlushLock.unlock();
           }
         }
      

      ... but above is no good w/o figuring why WALs are not being rotated off.

      1. 4853-v10.txt
        7 kB
        stack
      2. 4853-v9.txt
        8 kB
        stack
      3. 4853-v9.txt
        8 kB
        stack
      4. 4853-v8.txt
        8 kB
        stack
      5. 4853-v7.txt
        7 kB
        stack
      6. 4853-v6.txt
        4 kB
        stack
      7. 4853-v5.txt
        3 kB
        stack
      8. 4853-v4.txt
        9 kB
        stack
      9. 4853-trunk.txt
        2 kB
        stack
      10. 4853--no-prefix.txt
        2 kB
        stack
      11. 4853.txt
        2 kB
        stack

        Activity

        stack created issue -
        stack made changes -
        Field Original Value New Value
        Attachment 4853.txt [ 12504832 ]
        stack made changes -
        Attachment 4853--no-prefix.txt [ 12504841 ]
        stack made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        stack made changes -
        Attachment 4853-trunk.txt [ 12504846 ]
        stack made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        stack made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        stack made changes -
        Attachment 4853-v4.txt [ 12504860 ]
        stack made changes -
        Attachment 4853-v5.txt [ 12504938 ]
        stack made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Assignee stack [ stack ]
        stack made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        stack made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        stack made changes -
        Attachment 4853-v6.txt [ 12504941 ]
        stack made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ted Yu made changes -
        Comment [ By increasing timeout to 6 seconds (Pardon me, N), I wasn't able to reproduce failure in TestGlobalMemStoreSize after 20 iterations:
        {code}
        Index: src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        ===================================================================
        --- src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java (revision 1205638)
        +++ src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java (working copy)
        @@ -100,11 +100,12 @@
               }
               LOG.info("Post flush on " + server.getServerName());
               long now = System.currentTimeMillis();
        - long timeout = now + 3000;
        + long timeout = now + 6000;
               while(server.getRegionServerAccounting().getGlobalMemstoreSize() != 0 &&
                   timeout < System.currentTimeMillis()) {
                 Threads.sleep(10);
               }
        + LOG.info("About to check GlobalMemstoreSize");
               assertEquals("Server=" + server.getServerName() + ", i=" + i++, 0,
                 server.getRegionServerAccounting().getGlobalMemstoreSize());
             }
        {code} ]
        stack made changes -
        Attachment 4853-v7.txt [ 12504947 ]
        stack made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        stack made changes -
        Attachment 4853-v8.txt [ 12504949 ]
        stack made changes -
        Attachment 4853-v9.txt [ 12504957 ]
        stack made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        stack made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        stack made changes -
        Attachment 4853-v9.txt [ 12504960 ]
        stack made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        stack made changes -
        Attachment 4853-v10.txt [ 12504984 ]
        Jean-Daniel Cryans made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.92.0 [ 12314223 ]
        Fix Version/s 0.94.0 [ 12316419 ]
        Resolution Fixed [ 1 ]
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            stack
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development