Bigtop
  1. Bigtop
  2. BIGTOP-782

'service hue status' still show 'failed' after hue is started in SLES

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.5.0
    • Component/s: RPM
    • Labels:
      None

      Description

      in SLES, do

      pkgtest-sles64-11:~ # service hue start
      Starting Hue web server:                                                                   done
      
      pkgtest-sles64-11:~ # service hue status
      Hue web server is not running                                                              failed
      

      at this moment the service hue is already started, but the status still shows "failed", we can verify hue service started by trying its web GUI or by curl command

      # curl http://localhost:8888/accounts/login/
      
      <!DOCTYPE html>
      <html lang="en">
      <head>
      	<meta charset="utf-8">
      	<title>Hue Login</title>
      	<meta name="viewport" content="width=device-width, initial-scale=1.0">
      	<meta name="description" content="">
      	<meta name="author" content="">
      
      	<link href="/static/ext/css/bootstrap.min.css" rel="stylesheet">
      	<link href="/static/css/hue2.css" rel="stylesheet">
      
      	<!-- Le HTML5 shim, for IE6-8 support of HTML5 elements -->
      	<!--[if lt IE 9]>
      	<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
      	<![endif]-->
      
      	<style type="text/css">
      		body {
      			padding-top: 100px;
      		}
      	</style>
      </head>
      
      <body>
      	<div class="navbar navbar-fixed-top">
      		<div class="navbar-inner">
      			<div class="container-fluid">
      				<a class="brand" href="#">Hue</a>
      			</div>
      		</div>
      	</div>
      
      	<div class="container">
      		<div class="row">
      			<div class="span4 offset4">
      				<form method="POST" action="/accounts/login/" class="well">
      					<label>Username
      						<input name="username" class="input-large" type="text" maxlength="30">
      					</label>
      					<label>Password
      						<input name="password" class="input-large" type="password" maxlength="30">
      					</label>
      
      						<input type="submit" class="btn primary" value="Sign in" />
      					<input type="hidden" name="next" value="/" />
      
      				</form>
      			</div>
      		</div>
      
      	</div>
      </body>
      </html>
      

      Roman, we discussed before, this is a issue you been aware of, but I didn't found the specific jira to report this, so open this one

        Issue Links

          Activity

          Roman Shaposhnik made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Gavin made changes -
          Link This issue is related to BIGTOP-826 [ BIGTOP-826 ]
          Gavin made changes -
          Link This issue is related to BIGTOP-826 [ BIGTOP-826 ]
          Hide
          Mark Grover added a comment -

          Created BIGTOP-829

          Show
          Mark Grover added a comment - Created BIGTOP-829
          Hide
          Mark Grover added a comment -

          It looks like the patch that was committed (patch2) only dealt with "stop" functionality of the init.d script. The status functionality was missed and still queries based on the DAEMON variable (which points to supervisor) but instead should be pointing to python. Will create another JIRA for fixing the status.

          Show
          Mark Grover added a comment - It looks like the patch that was committed (patch2) only dealt with "stop" functionality of the init.d script. The status functionality was missed and still queries based on the DAEMON variable (which points to supervisor) but instead should be pointing to python. Will create another JIRA for fixing the status.
          Mark Grover made changes -
          Link This issue is related too BIGTOP-826 [ BIGTOP-826 ]
          Hide
          Mark Grover added a comment -

          BTW, Johnny pointed out that the return code when stopping hue on RHEL5/RHEL6/SUSE is wrong (1 instead of 0). Created BIGTOP-826 for it.

          Show
          Mark Grover added a comment - BTW, Johnny pointed out that the return code when stopping hue on RHEL5/RHEL6/SUSE is wrong (1 instead of 0). Created BIGTOP-826 for it.
          Hide
          Mark Grover added a comment -

          Yes, indeed Johnny. We issue a terminate command and wait for a set timeout. If the process is still around, then we get all evil and kill -KILL (aka kill -9) the process.

          Show
          Mark Grover added a comment - Yes, indeed Johnny. We issue a terminate command and wait for a set timeout. If the process is still around, then we get all evil and kill -KILL (aka kill -9) the process.
          Roman Shaposhnik made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Roman Shaposhnik made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          Johnny Zhang added a comment -

          Mark Grover thanks for reply, does "kill -9" is a way to kill it by force if shut down doesn't really work?

          Show
          Johnny Zhang added a comment - Mark Grover thanks for reply, does "kill -9" is a way to kill it by force if shut down doesn't really work?
          Hide
          Mark Grover added a comment -

          That's correct. Looking at the patch, it issues a kill command and then checks for the status every second for the next 15 seconds. If the PID is still around, after 15 seconds, it issues a kill -9

          Show
          Mark Grover added a comment - That's correct. Looking at the patch, it issues a kill command and then checks for the status every second for the next 15 seconds. If the PID is still around, after 15 seconds, it issues a kill -9
          Hide
          Johnny Zhang added a comment -

          "HUE_SHUTDOWN_TIMEOUT=15" does this mean the shut down process could take up to 15 seconds?

          Show
          Johnny Zhang added a comment - "HUE_SHUTDOWN_TIMEOUT=15" does this mean the shut down process could take up to 15 seconds?
          Hide
          Peter Linnell added a comment -

          +1 for the second patch

          Show
          Peter Linnell added a comment - +1 for the second patch
          Hide
          Mark Grover added a comment -

          +1 (non-committer) to the second patch. I would rather not trust python behavior (and it's forward compatibility) to kill hue

          Show
          Mark Grover added a comment - +1 (non-committer) to the second patch. I would rather not trust python behavior (and it's forward compatibility) to kill hue
          Hide
          Sean Mackrory added a comment -

          +1 to the second patch - as concise as the first patch is, it's pretty reliant on several other aspects of the environment to be just right, as opposed to just relying on the PID file to be correct.

          Show
          Sean Mackrory added a comment - +1 to the second patch - as concise as the first patch is, it's pretty reliant on several other aspects of the environment to be just right, as opposed to just relying on the PID file to be correct.
          Roman Shaposhnik made changes -
          Hide
          Roman Shaposhnik added a comment -

          Attaching a second version of the patch that removes the dependency on killproc altogether.

          Show
          Roman Shaposhnik added a comment - Attaching a second version of the patch that removes the dependency on killproc altogether.
          Hide
          Romain Rigaux added a comment -

          argv[0] is apparently system dependent: http://docs.python.org/2/library/sys.html#sys.argv

          Show
          Romain Rigaux added a comment - argv [0] is apparently system dependent: http://docs.python.org/2/library/sys.html#sys.argv
          Roman Shaposhnik made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          Roman Shaposhnik made changes -
          Assignee Roman Shaposhnik [ rvs ]
          Roman Shaposhnik made changes -
          Field Original Value New Value
          Attachment 0001-BIGTOP-782.-service-hue-status-still-show-failed-aft.patch [ 12554128 ]
          Hide
          Roman Shaposhnik added a comment -

          This seems to be related to pidofproc not being able to do its job since the actual name of the executable (argv[0]) happens to be python not supervisor. Attaching a patch that seems to fix the situation. If there are any python gurus out there who can help make python claim that argv[0] is the script name – I'd really appreciate the know how.

          Show
          Roman Shaposhnik added a comment - This seems to be related to pidofproc not being able to do its job since the actual name of the executable (argv [0] ) happens to be python not supervisor. Attaching a patch that seems to fix the situation. If there are any python gurus out there who can help make python claim that argv [0] is the script name – I'd really appreciate the know how.
          Hide
          Peter Linnell added a comment -

          As I commented in #783, I'm not really certain /sbin/service is the way to test this. If /etc/init.d/servicename does not work, then it is an initscript bug, something else or a bug in the LSB functions. However, from my experience the SLES LSB implementation is pretty solid and even includes some tools to check scripts on running them for LSB conformance.

          Show
          Peter Linnell added a comment - As I commented in #783, I'm not really certain /sbin/service is the way to test this. If /etc/init.d/servicename does not work, then it is an initscript bug, something else or a bug in the LSB functions. However, from my experience the SLES LSB implementation is pretty solid and even includes some tools to check scripts on running them for LSB conformance.
          Johnny Zhang created issue -

            People

            • Assignee:
              Roman Shaposhnik
              Reporter:
              Johnny Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development