HBase
  1. HBase
  2. HBASE-1718

Reuse of KeyValue during log replay could cause the wrong data to be used

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Our meta table got a row key of METAROW in it. Hard to explain how it happened, but under code inspection stack found that we are reusing the same KV instance for each replayed key.

      1. HBASE-1718-v1.patch
        0.6 kB
        Jonathan Gray

        Activity

        stack made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        stack added a comment -

        So, at a minimum, this patch fixes replaying of logs:

        Below I added logging. Replay had 17k edits. Only one was added, the last one.

        ....
        2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] INFO org.apache.hadoop.hbase.regionserver.Store: ADDED TO RECON CACHE: \x00\x00\x00\x03\x00\x01\x07\x02\x08\x05/info:data/1248830494586/Put/vlen=1000
        2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: Applied 17286, skipped 0 because sequence id <= 22830124
        2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: flushing reconstructionCache: 1
        2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] INFO org.apache.hadoop.hbase.regionserver.Store: DUMP \x00\x00\x00\x03\x00\x01\x07\x02\x08\x05/info:data/1248830494586/Put/vlen=1000
        

        I thought this was hbase-1483 but that was in splitLog. This is something else.

        Show
        stack added a comment - So, at a minimum, this patch fixes replaying of logs: Below I added logging. Replay had 17k edits. Only one was added, the last one. .... 2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] INFO org.apache.hadoop.hbase.regionserver.Store: ADDED TO RECON CACHE: \x00\x00\x00\x03\x00\x01\x07\x02\x08\x05/info:data/1248830494586/Put/vlen=1000 2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: Applied 17286, skipped 0 because sequence id <= 22830124 2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] DEBUG org.apache.hadoop.hbase.regionserver.Store: flushing reconstructionCache: 1 2009-07-29 01:27:15,274 [regionserver/208.76.44.139:60020.worker] INFO org.apache.hadoop.hbase.regionserver.Store: DUMP \x00\x00\x00\x03\x00\x01\x07\x02\x08\x05/info:data/1248830494586/Put/vlen=1000 I thought this was hbase-1483 but that was in splitLog. This is something else.
        Jonathan Gray made changes -
        Assignee Jonathan Gray [ streamy ]
        Jonathan Gray made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Jonathan Gray added a comment -

        Committed. Upon further thought, this could create cross-family scenario we've seen in HBASE-1638 / HBASE-1715 (but apurtell says no replay). It does, however, seem to be the case that this bug completely broke log replay as you would always end up only having the last seen KV.

        Show
        Jonathan Gray added a comment - Committed. Upon further thought, this could create cross-family scenario we've seen in HBASE-1638 / HBASE-1715 (but apurtell says no replay). It does, however, seem to be the case that this bug completely broke log replay as you would always end up only having the last seen KV.
        Hide
        stack added a comment -

        Single bad value.

        Show
        stack added a comment - Single bad value.
        Hide
        stack added a comment -

        +1 on patch.

        Its hard to test.

        Here is my supposition as to why this might be responsible for hbase-1638.

        Server crashes, its logs are split. The crashed servers regions are opened anew and there are logs for them to replay.

        The inner loop playing the reconstruction logs is this:

             while (logReader.next(key, val)) {
                maxSeqIdInLog = Math.max(maxSeqIdInLog, key.getLogSeqNum());
                .....
                // Check this edit is for me. Also, guard against writing the special
                // METACOLUMN info such as HBASE::CACHEFLUSH entries
                if (/* commented out for now - stack via jgray key.isTransactionEntry() || */
                    val.matchingFamily(HLog.METAFAMILY) ||
                  !Bytes.equals(key.getRegionName(), regioninfo.getRegionName()) ||
                  !val.matchingFamily(family.getName())) {
                  continue;
                }
                // Add anything as value as long as we use same instance each time.
                reconstructedCache.add(val);
                ....
              }
        

        So, a value might clear the family checks and get added to the reconstructionCache.

        We call next again. The 'val' instance is up in reconstructionCache. The next deserializes a new KV into same instance. The deserialized value might not make it past family checks but its already in the reconstructionCache.

        This would account for our adding a single value.

        Show
        stack added a comment - +1 on patch. Its hard to test. Here is my supposition as to why this might be responsible for hbase-1638. Server crashes, its logs are split. The crashed servers regions are opened anew and there are logs for them to replay. The inner loop playing the reconstruction logs is this: while (logReader.next(key, val)) { maxSeqIdInLog = Math .max(maxSeqIdInLog, key.getLogSeqNum()); ..... // Check this edit is for me. Also, guard against writing the special // METACOLUMN info such as HBASE::CACHEFLUSH entries if (/* commented out for now - stack via jgray key.isTransactionEntry() || */ val.matchingFamily(HLog.METAFAMILY) || !Bytes.equals(key.getRegionName(), regioninfo.getRegionName()) || !val.matchingFamily(family.getName())) { continue ; } // Add anything as value as long as we use same instance each time. reconstructedCache.add(val); .... } So, a value might clear the family checks and get added to the reconstructionCache. We call next again. The 'val' instance is up in reconstructionCache. The next deserializes a new KV into same instance. The deserialized value might not make it past family checks but its already in the reconstructionCache. This would account for our adding a single value.
        Hide
        Jean-Daniel Cryans added a comment -

        +1 seems fine.

        Show
        Jean-Daniel Cryans added a comment - +1 seems fine.
        Jonathan Gray made changes -
        Field Original Value New Value
        Attachment HBASE-1718-v1.patch [ 12414819 ]
        Hide
        Jonathan Gray added a comment -

        Instantiates a new KeyValue at the end of the while loop, meaning we only re-instantiate once we've passed forward the KV (we can reuse the times we do continue).

        Show
        Jonathan Gray added a comment - Instantiates a new KeyValue at the end of the while loop, meaning we only re-instantiate once we've passed forward the KV (we can reuse the times we do continue).
        Jonathan Gray created issue -

          People

          • Assignee:
            Jonathan Gray
            Reporter:
            Jonathan Gray
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development