Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6310

-ROOT- corruption when .META. is using the old encoding scheme

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Invalid
    • 0.94.0
    • None
    • None
    • None

    Description

      We're still working the on the root cause here, but after the leap second armageddon we had a hard time getting our 0.94 cluster back up. This is what we saw in the logs until the master died by itself:

      2012-07-01 23:01:52,149 DEBUG
      org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
      locateRegionInMeta parentTable=-ROOT-,
      metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
      port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
      because: HRegionInfo was null or empty in -ROOT-,
      row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
      .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
      

      (it's strage that we retry this)

      This was really misleading because I could see the regioninfo in a scan:

      hbase(main):002:0> scan '-ROOT-'
      ROW                                           COLUMN+CELL
       .META.,,1                                    column=info:regioninfo,
      timestamp=1331755381142, value={NAME => '.META.,,1', STARTKEY => '',
      ENDKEY => '', ENCODED => 1028785192,}
       .META.,,1                                    column=info:server,
      timestamp=1341183448693, value=sfor3s40:10304
       .META.,,1
      column=info:serverstartcode, timestamp=1341183448693,
      value=1341183444689
       .META.,,1                                    column=info:v,
      timestamp=1331755419291, value=\x00\x00
       .META.,,1259448304806                        column=info:server,
      timestamp=1341124914705, value=sfor3s24:10304
       .META.,,1259448304806
      column=info:serverstartcode, timestamp=1341124914705,
      value=1341124455863
      

      Except that the devil is in the details, ".META.,,1" is not ".META.,,1259448304806". Basically something writes to .META. by directly creating the row key without caring if the row is in the old format. I did a deleteall in the shell and it fixed the issue... until some time later it was stuck again because the edits reappeared (still not sure why). This time the PostOpenDeployTasksThread were stuck in the RS trying to update .META. but there was no logging (saw it with a jstack). I deleted the row again to make it work.

      I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 out, but I wouldn't recommend upgrading to 0.94 if your cluster was created before 0.89

      Attachments

        Activity

          People

            Unassigned Unassigned
            jdcryans Jean-Daniel Cryans
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: