Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: service/hbase
    • Labels:
      None

      Description

      Occasionally the HBase 0.92 service fails to start. See WHIRR-525 for a description.

      1. master.log
        37 kB
        Tom White
      2. rs.log
        33 kB
        Tom White
      3. WHIRR-552.patch
        2 kB
        Andrew Bayer
      4. zk.log
        13 kB
        Tom White

        Activity

        Hide
        tomwhite Tom White added a comment -

        Here are the log files from a time that the service failed.

        Show
        tomwhite Tom White added a comment - Here are the log files from a time that the service failed.
        Hide
        tomwhite Tom White added a comment -

        Patrick Hunt told me offline that the "Connected to an old server; r-o mode will be unavailable" message is not a problem. Also, worth trying telnet not nc, since nc sometimes has issues.

        I wonder if this is a timing issue, and simply making the RS wait a bit would help.

        Show
        tomwhite Tom White added a comment - Patrick Hunt told me offline that the "Connected to an old server; r-o mode will be unavailable" message is not a problem. Also, worth trying telnet not nc, since nc sometimes has issues. I wonder if this is a timing issue, and simply making the RS wait a bit would help.
        Hide
        karel1980 Karel Vervaeke added a comment -

        Here's what I got:

        5:17:23 RS starting up, failure to connect to zk (logical since zk not started yet)
        5:17:42 Master starting up, same kind of failures
        5:17:47 ZK starting
        5:17:47 RS connects to zk
        5:17:47 Master connects to zk
        5:17:53 Master: found 1 replicas but expecting no less than 3
        5:17:54 - 5:18:02 Master 'waiting for rs to check in'
        5:17:58 RS: first signs of stopping (aborting, initialization of fs failed, bla bla)

        The highlights to me are
        1) 5:17:53 Master:found 1 replicas but expecting no less than 3
        Wrong value for dfs.replication? Which value does it have in the datanodes?

        2) Why did the rs not check in between 5:17:48 and :58? No activity in the rs logs...

        3) The master logs appear truncated. Is it just part of the log or does it really end suddenly?
        Memory issues?

        Show
        karel1980 Karel Vervaeke added a comment - Here's what I got: 5:17:23 RS starting up, failure to connect to zk (logical since zk not started yet) 5:17:42 Master starting up, same kind of failures 5:17:47 ZK starting 5:17:47 RS connects to zk 5:17:47 Master connects to zk 5:17:53 Master: found 1 replicas but expecting no less than 3 5:17:54 - 5:18:02 Master 'waiting for rs to check in' 5:17:58 RS: first signs of stopping (aborting, initialization of fs failed, bla bla) The highlights to me are 1) 5:17:53 Master:found 1 replicas but expecting no less than 3 Wrong value for dfs.replication? Which value does it have in the datanodes? 2) Why did the rs not check in between 5:17:48 and :58? No activity in the rs logs... 3) The master logs appear truncated. Is it just part of the log or does it really end suddenly? Memory issues?
        Hide
        amansk Amandeep Khurana added a comment -

        I ran into issues where the parent znode was not created because the master had not yet initialized. RS came up and died because of that.
        Related issue - https://issues.apache.org/jira/browse/HBASE-5666.

        Show
        amansk Amandeep Khurana added a comment - I ran into issues where the parent znode was not created because the master had not yet initialized. RS came up and died because of that. Related issue - https://issues.apache.org/jira/browse/HBASE-5666 .
        Hide
        abayer Andrew Bayer added a comment -

        Switching to HBase 0.92.2 seems to do the trick.

        Show
        abayer Andrew Bayer added a comment - Switching to HBase 0.92.2 seems to do the trick.

          People

          • Assignee:
            abayer Andrew Bayer
            Reporter:
            tomwhite Tom White
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development