Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-1121

Slider AM has a race condition in port allocation

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Slider 0.90.2
    • Slider 0.91
    • None
    • None

    Description

      /cc Vinod Kumar Vavilapalli, Gour Saha

      When two (or more) slider AMs are launched on a given node, it looks like both AMs could attempt to bind to the same port, resulting in AM crash(es). See below for an example.

      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Cluster provider type is agent
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: RM is at y016.boo.hoo.com/172.26.32.116:8030
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM for ID 26
      16/05/13 02:34:29 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
      16/05/13 02:34:29 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
      16/05/13 02:34:29 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
      16/05/13 02:34:29 INFO ipc.Server: Starting Socket Reader #1 for port 1025
      16/05/13 02:34:29 INFO ipc.Server: IPC Server Responder: starting
      16/05/13 02:34:29 INFO ipc.Server: IPC Server listener on 1025: starting
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM Server is listening at y053.boo.hoo.com:1025
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Starting Yarn registry
      16/05/13 02:34:29 INFO imps.CuratorFrameworkImpl: Starting
      16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
      16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client environment:host.name=y053.boo.hoo.com
      
      16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_60
      16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
      16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/jdk64/jdk1.8.0_60/jre
      
      at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)
      	... 10 more
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Triggering shutdown of the AM: stop:  exit code = 56, FAILED: Port in use: 0.0.0.0:1025;
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Process has exited with exit code 0 mapped to 0 -ignoring
      16/05/13 02:34:29 INFO workflow.WorkflowCompositeService: Child service completed Service RoleLaunchService in state RoleLaunchService: STOPPED
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Setting stopInitiated flag to true
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Container release timeout in millis = 0
      16/05/13 02:34:29 INFO state.AppState: Releasing 1 containers
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Application completed. Signalling finish to RM
      16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Unregistering AM status=FAILED message=Port in use: 0.0.0.0:1025
      16/05/13 02:34:29 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
      Exception: Port in use: 0.0.0.0:1025
      16/05/13 02:34:29 ERROR main.ServiceLauncher: Exception: Port in use: 0.0.0.0:1025
      java.net.BindException: Port in use: 0.0.0.0:1025
      	at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:920)
      	at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)
      	at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:274)
      	at org.apache.slider.server.appmaster.SliderAppMaster.deployWebApplication(SliderAppMaster.java:1106)
      	at org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:992)
      	at org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:580)
      	at org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
      	at org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
      	at org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
      

      Attachments

        1. SLIDER-1121.1.patch
          10 kB
          Billie Rinaldi
        2. SLIDER-1121.2.patch
          12 kB
          Billie Rinaldi
        3. SLIDER-1121.3.patch
          12 kB
          Billie Rinaldi
        4. SLIDER-1121.4.patch
          12 kB
          Billie Rinaldi

        Activity

          People

            billie Billie Rinaldi
            sidharta-s Sidharta Seethana
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment