HBase
  1. HBase
  2. HBASE-241

[hbase] Scan of .META. does socket timeout over and over again (rather than

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      A mismatch in the code on the cluster revealed an infinite loop. The .META. scanner is doing a socket timeout trying to contact a borked region server (The borked server was having trouble contacting hdfs because of of code version mismatch – it was sort-of-working). We retry the timeout up to the retry limit but then rather than try and redeploy the unreachable .META. we just drop back into scanning at the old location.... I'll attach a log that illustrates the goings-on.

      I think this likely a trivial issue since it shouldn't really ever happen....

        Activity

        Hide
        stack added a comment -

        Log excerpt illustrating the problem.

        Show
        stack added a comment - Log excerpt illustrating the problem.
        Hide
        Jim Kellerman added a comment -

        If a region server cannot contact the HDFS, it should shut itself down. In this case the master will notice when the region server's lease times out and reassign the region.

        Show
        Jim Kellerman added a comment - If a region server cannot contact the HDFS, it should shut itself down. In this case the master will notice when the region server's lease times out and reassign the region.
        Hide
        stack added a comment -

        I buy your rationale above Jim.

        There may be other states that a regionserver can get into like the one described herein where it wouldn't go down and it kept making achk, achk, achk noises like some wounded duck but we can open a new issue to address it when we see it.

        Resolving. Fixed by HADOOP-1801

        Show
        stack added a comment - I buy your rationale above Jim. There may be other states that a regionserver can get into like the one described herein where it wouldn't go down and it kept making achk, achk, achk noises like some wounded duck but we can open a new issue to address it when we see it. Resolving. Fixed by HADOOP-1801

          People

          • Assignee:
            Unassigned
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development