Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20792

info:servername and info:sn inconsistent for OPEN region

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0, 2.1.0
    • Component/s: Region Assignment
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Next problem we've run into after HBASE-20752 and HBASE-20708

      After a rolling restart of a cluster, we'll see situations where a collection of regions will simply not be assigned out to the RS. I was able to reproduce this my mimic the restart patterns our tests do internally (ignore whether this is the best way to restart nodes for now ). The general pattern is this:

      for rs in regionservers:
        stop(server, rs, RS)
      for master in masters:
        stop(server, master, MASTER)
      
      sleep(15)
      
      for master in masters:
        start(server, master, MASTER)
      for rs in regionservers:
        start(server, rs, RS)

      Looking at meta, we can see why the Master is ignoring some regions:

       test                                                        column=table:state, timestamp=1529871718998, value=\x08\x00
       test,,1529871718122.0297f680df6dc0166a44f9536346268e.       column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 0297f680df6dc0166a44f9536346268e, NAME => 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
                                                                   => '', ENDKEY => ''}
       test,,1529871718122.0297f680df6dc0166a44f9536346268e.       column=info:seqnumDuringOpen, timestamp=1529967103390, value=\x00\x00\x00\x00\x00\x00\x00*
       test,,1529871718122.0297f680df6dc0166a44f9536346268e.       column=info:server, timestamp=1529967103390, value=ctr-e138-1518143905142-378097-02-000012.hwx.site:16020
       test,,1529871718122.0297f680df6dc0166a44f9536346268e.       column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
       test,,1529871718122.0297f680df6dc0166a44f9536346268e.       column=info:sn, timestamp=1529967096482, value=ctr-e138-1518143905142-378097-02-000006.hwx.site,16020,1529966755170
       test,,1529871718122.0297f680df6dc0166a44f9536346268e.       column=info:state, timestamp=1529967103390, value=OPEN

      The region is marked as OPEN. The master doesn't know any better. However, the interesting bit is that info:server and info:sn are inconsistent (which, according to the javadoc should not be possible for an OPEN region).{{}}

      This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd attempt, so I'm hopeful it's not a bear to repro.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                elserj Josh Elser
                Reporter:
                elserj Josh Elser
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: