Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-1236

Unnecessary 10 second sleep before installation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Noticed when starting LLAP on a 2-node cluster. Slider AM logs:

      2017-05-22 22:04:33,047 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
      ...
      2017-05-22 22:04:34,946 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Registration response: RegistrationResponse{response=OK, responseId=0, statusCommands=null}
      

      Then nothing useful goes on for a while, until:

      2017-05-22 22:04:43,099 [956937652@qtp-624693846-4] INFO  agent.AgentProviderService - Installing LLAP on container_1495490227300_0002_01_000002.
      

      If you look at the corresponding logs from both agents, you can see that they both have a gap that's pretty much exactly 10sec.
      After the gap, they talk back to AM; after ~30ms for each container (corresponding to the end of its gap), presumably after hearing from it, the AM starts installing LLAP.

      INFO 2017-05-22 22:04:33,055 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
      INFO 2017-05-22 22:04:33,055 Controller.py:630 - Response from server = OK
      INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
      INFO 2017-05-22 22:04:43,065 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490683064. Command(s) in progress: False. Components mapped: False
      
      INFO 2017-05-22 22:04:34,948 Controller.py:180 - Registered with the server with {u'exitstatus': 0,
      INFO 2017-05-22 22:04:34,948 Controller.py:630 - Response from server = OK
      INFO 2017-05-22 22:04:44,959 AgentToggleLogger.py:40 - Queue result: {'componentStatus': [], 'reports': []}
      INFO 2017-05-22 22:04:44,960 AgentToggleLogger.py:40 - Sending heartbeat with response id: 0 and timestamp: 1495490684959. Command(s) in progress: False. Components mapped: False
      

      I've observed the same on multiple different clusters.

      Attachments

        Activity

          People

            gsaha Gour Saha
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: