Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8502

Eternally stuck Region after split

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Abandoned
    • 0.92.1
    • None
    • None
    • None

    Description

      Exact HBase version: 0.92.1-cdh4.1.2

      A couple of days ago I encountered a RIT problem with a single region.
      After an hbck run it started trying to assign a region which has been
      bouncing between OFFLINE/PENDING_OPEN/OPENING for two days afterwards.

      This was due to a split gone wrong in some way, which led to several
      reference files being left in the region-directory despite the two relevant HFiles being copies successfully to the daughter.

      I will try to give as many details as possible, but unfortunately I was
      unable to find any information about the split itself.

      Short thread about this issue on the users-ML: http://mail-archives.apache.org/mod_mbox/hbase-user/201305.mbox/%3C5182758B.1060306@neofonie.de%3E

      ===

      Parent region: 5b9c16898a371de58f31f0bdf86b1f8b
      Daughter region in question: 79c619508659018ff3ef0887611eb8f7

      Rough sequence from the logs seems to be the following:

      ===

      • Received request to open region:
        documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.
      • Setting up tabledescriptor config now ...
      • Opening of region {NAME => 'documents,7128586022887322720,1363696791400.79c619508659018ff3ef0887611eb8f7.', STARTKEY => '7128586022887322720', ENDKEY => '7130716361635801616', ENCODED => 79c619508659018ff3ef0887611eb8f7,}

        failed, marking as
        FAILED_OPEN in ZK

      • File does not exist:
        /hbase/documents/5b9c16898a371de58f31f0bdf86b1f8b/d/0707b1ec4c6b41cf9174e0d2a1785fe9
        [...]
        ===

      What happened, was that somehow (and that's the question here) the daughters
      region folder contained some left-over reference files were causing the
      RegionServer to look-up the parent region, which already was deleted.

      original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d:
      ==
      0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b
      47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b
      4f01ecd052ce464d81e79a62ea227d6b
      4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b
      eb7dbb09701d4353be24ca82481c4a7e
      ==

      I attached the full FileNotFound Exception.

      Please let me know if I can provide more information or help otherwise.

      Attachments

        1. stuck_region_exception.txt
          9 kB
          Dimitri Goldin
        2. hbase_run.log
          12 kB
          Dimitri Goldin
        3. hbase_lost_parent.txt
          50 kB
          Dimitri Goldin

        Issue Links

          Activity

            People

              Unassigned Unassigned
              goldin Dimitri Goldin
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: