Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12319

Inconsistencies during region recovery due to close/open of a region during recovery

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.98.7, 0.99.1
    • 0.98.8, 0.99.2
    • None
    • None
    • Reviewed

    Description

      In one of my test runs, I saw the following:

      2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true
      2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d
      .............
      .............
      2014-10-14 13:45:31,916 WARN  [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0000000000000198080
      

      The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies.

      Attachments

        1. HBASE-12319-v2.patch
          4 kB
          Jeffrey Zhong
        2. HBASE-12319.patch
          1.0 kB
          Jeffrey Zhong

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jeffreyz Jeffrey Zhong
            ddas Devaraj Das
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment