Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-17471

Region Seqid will be out of order in WAL if using mvccPreAssign

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.4.0, 2.0.0
    • 1.4.0, 2.0.0
    • wal
    • None
    • Reviewed
    • Hide
      MVCCPreAssign is added by HBASE-16698, but pre-assign mvcc is only used in put/delete path. Other write paths like increment/append still assign mvcc in ringbuffer's consumer thread. If put and increment are used parallel. Then seqid in WAL may not increase monotonically. Disorder in wals will lead to data loss.This patch bring all mvcc/seqid event in wal.append, and synchronize wal append and mvcc acquirement. No disorder in wal will happen. Performance test shows no regression with this patch.
      Show
      MVCCPreAssign is added by HBASE-16698 , but pre-assign mvcc is only used in put/delete path. Other write paths like increment/append still assign mvcc in ringbuffer's consumer thread. If put and increment are used parallel. Then seqid in WAL may not increase monotonically. Disorder in wals will lead to data loss.This patch bring all mvcc/seqid event in wal.append, and synchronize wal append and mvcc acquirement. No disorder in wal will happen. Performance test shows no regression with this patch.

    Description

      mvccPreAssign was brought by HBASE-16698, which truly improved the performance of writing, especially in ASYNC_WAL scenario. But mvccPreAssign was only used in doMiniBatchMutate, not in Increment/Append path. If Increment/Append and batch put are using against the same region in parallel, then seqid of the same region may not monotonically increasing in the WAL. Since one write path acquires mvcc/seqid before append, and the other acquires in the append/sync consume thread.

      The out of order situation can easily reproduced by a simple UT, which was attached in the attachment. I modified the code to assert on the disorder:

          if(this.highestSequenceIds.containsKey(encodedRegionName)) {
            assert highestSequenceIds.get(encodedRegionName) < sequenceid;
          }
      

      I'd like to say, If we allow disorder in WALs, then this is not a issue.

      But as far as I know, if highestSequenceIds is not properly set, some WALs may not archive to oldWALs correctly.

      which I haven't figure out yet is that, will disorder in WAL cause data loss when recovering from disaster? If so, then it is a big problem need to be fixed.

      I have fix this problem in our costom1.1.x branch, my solution is using mvccPreAssign everywhere, making it un-configurable. Since mvccPreAssign it is indeed a better way than assign seqid in the ringbuffer thread while keeping handlers waiting for it.

      If anyone think it is doable, then I will port it to branch-1 and master branch and upload it.

      Attachments

        1. HBASE-17471.patch
          7 kB
          Allan Yang
        2. HBASE-17471.tmp
          7 kB
          Allan Yang
        3. HBASE-17471.v2.patch
          19 kB
          Allan Yang
        4. HBASE-17471.v3.patch
          19 kB
          Allan Yang
        5. HBASE-17471.v4.patch
          34 kB
          Allan Yang
        6. HBASE-17471.v5.patch
          42 kB
          Allan Yang
        7. HBASE-17471.v6.patch
          42 kB
          Allan Yang
        8. HBASE-17471-branch-1.v0.patch
          31 kB
          Allan Yang
        9. HBASE-17471-branch-1.v1.patch
          31 kB
          Allan Yang
        10. HBASE-17471-branch-1.v2.patch
          33 kB
          Allan Yang
        11. HBASE-17471-branch-1.v3.patch
          33 kB
          Allan Yang
        12. HBASE-17471-duo.patch
          25 kB
          Duo Zhang
        13. HBASE-17471-duo-v1.patch
          28 kB
          Duo Zhang
        14. HBASE-17471-duo-v2.patch
          35 kB
          Duo Zhang

        Issue Links

          Activity

            People

              allan163 Allan Yang
              allan163 Allan Yang
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: