HBase
  1. HBase
  2. HBASE-3431

Regionserver is not using the name given it by the master; double entry in master listing of servers

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.90.0
    • Fix Version/s: 0.92.0
    • Component/s: None
    • Labels:
      None

      Description

      Our man Ted Dunning found the following where RS checks in with one name, the master tells it use another name but we seem to go ahead and continue with our original name.

      In RS logs I see:

      2011-01-07 15:45:50,757 INFO  org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
      

      On master I see

      2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 0 on 60000]: Registering server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
      

      ....

      then later

      2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager [IPC Server handler 2 on 60000]: Registering server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
      

      This might be since we started letting servers register in other than with the reportStartup.

      1. 3431.txt
        7 kB
        stack
      2. 3431-v2.txt
        4 kB
        stack
      3. 3431-v3.txt
        6 kB
        stack
      4. 3431-v3.txt
        6 kB
        stack
      5. 3431-v4.txt
        6 kB
        stack

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #1909 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1909/)

        Show
        Hudson added a comment - Integrated in HBase-TRUNK #1909 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1909/ )
        Hide
        stack added a comment -

        Resolving. I added to FAQ section on what to do if you are seeing double (the regionservers). Resolving also because hbase-1502 changes how this all works so hopefully we don't see this type of issue anymore.

        Show
        stack added a comment - Resolving. I added to FAQ section on what to do if you are seeing double (the regionservers). Resolving also because hbase-1502 changes how this all works so hopefully we don't see this type of issue anymore.
        Hide
        stack added a comment -

        Moving out. J-D won't let me get away w/ my radical strip back of code; i'll break stuff thats currently 'working' (probably true). I can't change the RPC because we're on a point version. I can't change the way HSI and HSA work, because we are on a point version. The only invariant in the communication between Master and RS is the server startcode (and port). I could try and leverage this fact but looks like a bunch of work – and this stuff is all going away in 0.92.

        Let me add to the FAQ in the book what to do if user has double-vision. That'll do for 0.90.1.

        Show
        stack added a comment - Moving out. J-D won't let me get away w/ my radical strip back of code; i'll break stuff thats currently 'working' (probably true). I can't change the RPC because we're on a point version. I can't change the way HSI and HSA work, because we are on a point version. The only invariant in the communication between Master and RS is the server startcode (and port). I could try and leverage this fact but looks like a bunch of work – and this stuff is all going away in 0.92. Let me add to the FAQ in the book what to do if user has double-vision. That'll do for 0.90.1.
        Hide
        stack added a comment -

        I can't use EnvironmentEdge to change addresses since the InetSocketAddress that is at root of our HServerAddress, etc., is taken from the socket down in RPC – I can't interject EnvironmentEdge inside Socket.getLocalSocketAddress, etc.

        I can't change how HSA or HSI serialize since this is a point release.

        All this is going to go away, or at least change radically, 0.92 because we intend dropping heartbeat.

        Show
        stack added a comment - I can't use EnvironmentEdge to change addresses since the InetSocketAddress that is at root of our HServerAddress, etc., is taken from the socket down in RPC – I can't interject EnvironmentEdge inside Socket.getLocalSocketAddress, etc. I can't change how HSA or HSI serialize since this is a point release. All this is going to go away, or at least change radically, 0.92 because we intend dropping heartbeat.
        Hide
        stack added a comment -

        Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use.

        J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward.

        Show
        stack added a comment - Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward.
        Hide
        stack added a comment -

        Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use.

        J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward.

        Show
        stack added a comment - Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward.
        Hide
        ryan rawson added a comment -

        one thing to consider is a lot of the network code attempts to figure
        out what is the 'primary ip' then bind to just that IP.

        would it make sense to bind to * instead? (ie: 0.0.0.0) Why not accept
        RPCs on all interfaces? If security is a concern, I think SASL and
        host level firewall controls are a better way to address that, rather
        than bake it in HBase. That way it won't really "matter" what our IP
        is, whatever IP the master 'sees' us as could be used as what to stuff
        in the META. Then we could use the registration name to identify dead
        hosts, etc, etc.

        Show
        ryan rawson added a comment - one thing to consider is a lot of the network code attempts to figure out what is the 'primary ip' then bind to just that IP. would it make sense to bind to * instead? (ie: 0.0.0.0) Why not accept RPCs on all interfaces? If security is a concern, I think SASL and host level firewall controls are a better way to address that, rather than bake it in HBase. That way it won't really "matter" what our IP is, whatever IP the master 'sees' us as could be used as what to stuff in the META. Then we could use the registration name to identify dead hosts, etc, etc.
        Hide
        stack added a comment -

        Chatted w/ Jon and J-D on this. Jon suggests EnvironmentEdgeManager utility as means of intercepting lookups so we can do up tests returning different answers. Let me try it out. J-D rehearsed issues w/ have had in here over time and that this 'mess' was 'working' in 0.20.x and even unto 0.89.x (He remembers also that a RS can volunteer its address as 127.0.0.1 but actually bind to real, non-localhost address somehow). He's wary about stripping it all out as the patch does. Let me try and put up unit tests that can mock the various scenarios.

        Looking at code w/ J-D, we turned up one problematic bit of code – HSA will create a new InetSocketAddress on deserialization which can result in a lookup.

        Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. TT does something similar.

        Show
        stack added a comment - Chatted w/ Jon and J-D on this. Jon suggests EnvironmentEdgeManager utility as means of intercepting lookups so we can do up tests returning different answers. Let me try it out. J-D rehearsed issues w/ have had in here over time and that this 'mess' was 'working' in 0.20.x and even unto 0.89.x (He remembers also that a RS can volunteer its address as 127.0.0.1 but actually bind to real, non-localhost address somehow). He's wary about stripping it all out as the patch does. Let me try and put up unit tests that can mock the various scenarios. Looking at code w/ J-D, we turned up one problematic bit of code – HSA will create a new InetSocketAddress on deserialization which can result in a lookup. Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. TT does something similar.
        Hide
        ryan rawson added a comment -

        I'll have a look monday

        Show
        ryan rawson added a comment - I'll have a look monday
        Hide
        stack added a comment -

        Tested w/ name resolution broke on both ends. If I broke lookup good, server wouldn't start complaining couldn't resolve name (thats not new to my patch). If no resolve when it got to server side then again same thing w/ a complaint that couldn't resolve regionserver name... again not new to my patch... more a commentary on how hbase will complain loudly already if resolve is mangled. Messages are pretty plain about whats wrong.

        I broke master resolve so the incoming RS did not resolve to a proper address – in the past we'd send back an IP and use that ever after and then you'd have double-vision after next heartbeat – and then on RS I broke it so passed back a FQDN when Master was dealing in host names only. That worked too.

        Review please. Unit tests are hard to do. Would have to somehow mock java dns lookup. Changing the dns doesn't seem to be possible (I can see providing alternate dns provider to jndi if you provide flags on JVM startup).

        Show
        stack added a comment - Tested w/ name resolution broke on both ends. If I broke lookup good, server wouldn't start complaining couldn't resolve name (thats not new to my patch). If no resolve when it got to server side then again same thing w/ a complaint that couldn't resolve regionserver name... again not new to my patch... more a commentary on how hbase will complain loudly already if resolve is mangled. Messages are pretty plain about whats wrong. I broke master resolve so the incoming RS did not resolve to a proper address – in the past we'd send back an IP and use that ever after and then you'd have double-vision after next heartbeat – and then on RS I broke it so passed back a FQDN when Master was dealing in host names only. That worked too. Review please. Unit tests are hard to do. Would have to somehow mock java dns lookup. Changing the dns doesn't seem to be possible (I can see providing alternate dns provider to jndi if you provide flags on JVM startup).
        Hide
        stack added a comment -

        If RS passes 127.0.0.1, then thats what its bound too and no (remote) client will be able to connect. Its broke.

        The fixup in master would let this (broke) server successfully register. The master would call remoteIP on the connected socket to get the RSs' address and it would then know the RS as this. This would happen only on startup, in reportForDuty, not subsequently during heartbeating; we only do the lookup of remoteip on reportForDuty.

        Heartbeating, the RS was supposed to be volunteering the HServerInfo that the Master had passed it back as response to the reportForDuty.

        Since 0.90.0, servers can register at heartbeat time. This is because masters can join an already running cluster. The RSs do not rerun the reportForDuty step. They just start heartbeating the new Master.

        We could I suppose add lookup on the sockets remoteip to heartbeating too with reverse lookup.

        I'm thinking its better to just strip all this crap out.

        Show
        stack added a comment - If RS passes 127.0.0.1, then thats what its bound too and no (remote) client will be able to connect. Its broke. The fixup in master would let this (broke) server successfully register. The master would call remoteIP on the connected socket to get the RSs' address and it would then know the RS as this. This would happen only on startup, in reportForDuty, not subsequently during heartbeating; we only do the lookup of remoteip on reportForDuty. Heartbeating, the RS was supposed to be volunteering the HServerInfo that the Master had passed it back as response to the reportForDuty. Since 0.90.0, servers can register at heartbeat time. This is because masters can join an already running cluster. The RSs do not rerun the reportForDuty step. They just start heartbeating the new Master. We could I suppose add lookup on the sockets remoteip to heartbeating too with reverse lookup. I'm thinking its better to just strip all this crap out.
        Hide
        Jean-Daniel Cryans added a comment -

        Instead Master just uses the ServerName the RS volunteered.

        So what happens if region server passes 127.0.0.1?

        Show
        Jean-Daniel Cryans added a comment - Instead Master just uses the ServerName the RS volunteered. So what happens if region server passes 127.0.0.1?
        Hide
        stack added a comment -

        More DNS breaking turned up fact that on startup, Master should not be setting RS address into HSI. Still testing.

        Show
        stack added a comment - More DNS breaking turned up fact that on startup, Master should not be setting RS address into HSI. Still testing.
        Hide
        stack added a comment -

        Should have stripped setting IP of RS on Master side when doing startup message. More breaking of DNS turned up this one. Still testing.

        Show
        stack added a comment - Should have stripped setting IP of RS on Master side when doing startup message. More breaking of DNS turned up this one. Still testing.
        Hide
        stack added a comment -

        The issue is that if the master sees a RegionServer differently to how the RS sees itself – e.g. master gets an ip when it does lookup though RS passed a name or if RS passed a FQDN but master has hostname only – then the master will ask the RS to take on the name the Master sees by passing it back an HServerAddress. This does not work if the two servers are getting different answers from their respective DNS's. The Master knows RS's by their 'ServerName' which is hostname+port+startcode. If DNS is wonky, then the Master and RS will come up with different 'ServerName's even if the Master passes back its HSA (HSA could be IP only, RS does lookup and comes up w/ different hostname if DNS is broke). This patch removes the code that has master trying the RS the identity to use. Instead Master just uses the ServerName the RS volunteered.

        So far in testing it seems to work when DNS is set up properly and when Master side DNS is broke where its finding IP only for RS. Let me do some more testing.

        Show
        stack added a comment - The issue is that if the master sees a RegionServer differently to how the RS sees itself – e.g. master gets an ip when it does lookup though RS passed a name or if RS passed a FQDN but master has hostname only – then the master will ask the RS to take on the name the Master sees by passing it back an HServerAddress. This does not work if the two servers are getting different answers from their respective DNS's. The Master knows RS's by their 'ServerName' which is hostname+port+startcode. If DNS is wonky, then the Master and RS will come up with different 'ServerName's even if the Master passes back its HSA (HSA could be IP only, RS does lookup and comes up w/ different hostname if DNS is broke). This patch removes the code that has master trying the RS the identity to use. Instead Master just uses the ServerName the RS volunteered. So far in testing it seems to work when DNS is set up properly and when Master side DNS is broke where its finding IP only for RS. Let me do some more testing.
        Hide
        stack added a comment -

        If master can't find regionserver address, then master does this:

        Caused by: java.lang.IllegalArgumentException: Could not resolve the DNS name of sv2borg185:60020
            at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
            at org.apache.hadoop.hbase.HServerAddress.readFields(HServerAddress.java:168)
            at org.apache.hadoop.hbase.HServerInfo.readFields(HServerInfo.java:230)
            at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
            ... 8 more
        

        ... which is kinda dumb but means no progress unless server can get an address.

        If DNS is wrong, e.g. on master, when it does a lookup on passed name, we come up w/ a different address, then we'll tell the regionserver go forward with the IP.

        At moment you'll see two entries for this badly configured server. The regionserver will show by its name and by its bad IP.

        Symptom is you can't shutdown because master is waiting on the ghost server to finish its close up (this is what was happening for mr oracle.com).

        I manufactured Ted's prob. by changing hosts on master to have different subnet for a server. Then I got this in RS log:

        2011-02-05 00:33:49,409 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to use. Was=sv2borg185:60020, Now=10.20.20.185:60020
        

        Let me dig in.

        Show
        stack added a comment - If master can't find regionserver address, then master does this: Caused by: java.lang.IllegalArgumentException: Could not resolve the DNS name of sv2borg185:60020 at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) at org.apache.hadoop.hbase.HServerAddress.readFields(HServerAddress.java:168) at org.apache.hadoop.hbase.HServerInfo.readFields(HServerInfo.java:230) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) ... 8 more ... which is kinda dumb but means no progress unless server can get an address. If DNS is wrong, e.g. on master, when it does a lookup on passed name, we come up w/ a different address, then we'll tell the regionserver go forward with the IP. At moment you'll see two entries for this badly configured server. The regionserver will show by its name and by its bad IP. Symptom is you can't shutdown because master is waiting on the ghost server to finish its close up (this is what was happening for mr oracle.com). I manufactured Ted's prob. by changing hosts on master to have different subnet for a server. Then I got this in RS log: 2011-02-05 00:33:49,409 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to use. Was=sv2borg185:60020, Now=10.20.20.185:60020 Let me dig in.
        Hide
        stack added a comment -

        Up on IRC we just had case where RS was reporting hostname only but reverse lookup was return FQDN.

        Show
        stack added a comment - Up on IRC we just had case where RS was reporting hostname only but reverse lookup was return FQDN.
        Hide
        stack added a comment -

        Workaround is to make reverse DNS on master produce same hostname as that which the RegionServer reports (RS hostname lookup).

        Show
        stack added a comment - Workaround is to make reverse DNS on master produce same hostname as that which the RegionServer reports (RS hostname lookup).
        Hide
        stack added a comment -

        We are seeing lots of permutations on this issue up in mailing lists. Lets fix this in a 0.90.1.

        Show
        stack added a comment - We are seeing lots of permutations on this issue up in mailing lists. Lets fix this in a 0.90.1.
        Hide
        stack added a comment -

        A unit test that has master setting an address for the regionserver to use and then verifying that subsequently the regionserver volunteers what we told it use. Passes on TRUNK which would seem to say this stuff should be working fine.

        I compared 0.89 and 0.90 HRS. Nothing jumps out.

        Show
        stack added a comment - A unit test that has master setting an address for the regionserver to use and then verifying that subsequently the regionserver volunteers what we told it use. Passes on TRUNK which would seem to say this stuff should be working fine. I compared 0.89 and 0.90 HRS. Nothing jumps out.
        Hide
        stack added a comment -

        Seems like this is a regression since 0.89. Ted says 0.89 works on his cluster. The master is seeing RS as an IP then subsequently the RS is giving the IP back as its 'name'. Ted is also starting things a little odd... manually starting each daemon... with the RS saying that its NotReadyYet exception in 0.90.

        Show
        stack added a comment - Seems like this is a regression since 0.89. Ted says 0.89 works on his cluster. The master is seeing RS as an IP then subsequently the RS is giving the IP back as its 'name'. Ted is also starting things a little odd... manually starting each daemon... with the RS saying that its NotReadyYet exception in 0.90.
        Hide
        stack added a comment -

        Why is RS not taking what the Master tells it use when gong to the Master?

        Show
        stack added a comment - Why is RS not taking what the Master tells it use when gong to the Master?

          People

          • Assignee:
            stack
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development