HBase
  1. HBase
  2. HBASE-241

[hbase] Scan of .META. does socket timeout over and over again (rather than

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      A mismatch in the code on the cluster revealed an infinite loop. The .META. scanner is doing a socket timeout trying to contact a borked region server (The borked server was having trouble contacting hdfs because of of code version mismatch – it was sort-of-working). We retry the timeout up to the retry limit but then rather than try and redeploy the unreachable .META. we just drop back into scanning at the old location.... I'll attach a log that illustrates the goings-on.

      I think this likely a trivial issue since it shouldn't really ever happen....

        Activity

        stack created issue -
        Hide
        stack added a comment -

        Log excerpt illustrating the problem.

        Show
        stack added a comment - Log excerpt illustrating the problem.
        stack made changes -
        Field Original Value New Value
        Attachment excerpt.txt [ 12364811 ]
        Jim Kellerman made changes -
        Assignee Jim Kellerman [ jimk ]
        Hide
        Jim Kellerman added a comment -

        If a region server cannot contact the HDFS, it should shut itself down. In this case the master will notice when the region server's lease times out and reassign the region.

        Show
        Jim Kellerman added a comment - If a region server cannot contact the HDFS, it should shut itself down. In this case the master will notice when the region server's lease times out and reassign the region.
        Jim Kellerman made changes -
        Assignee Jim Kellerman [ jimk ] stack [ stack ]
        Hide
        stack added a comment -

        I buy your rationale above Jim.

        There may be other states that a regionserver can get into like the one described herein where it wouldn't go down and it kept making achk, achk, achk noises like some wounded duck but we can open a new issue to address it when we see it.

        Resolving. Fixed by HADOOP-1801

        Show
        stack added a comment - I buy your rationale above Jim. There may be other states that a regionserver can get into like the one described herein where it wouldn't go down and it kept making achk, achk, achk noises like some wounded duck but we can open a new issue to address it when we see it. Resolving. Fixed by HADOOP-1801
        stack made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.15.0 [ 12312565 ]
        Resolution Fixed [ 1 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s contrib/hbase [ 12311752 ]
        Owen O'Malley made changes -
        Assignee stack [ stack ]
        Project Hadoop Core [ 12310240 ] Hadoop HBase [ 12310753 ]
        Key HADOOP-1816 HBASE-241
        Fix Version/s 0.15.0 [ 12312565 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development