HCatalog
  1. HCatalog
  2. HCATALOG-389

hcat_ping (script to check if HCatalog server is running/reachable)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4.1
    • Fix Version/s: 0.5, 0.4.1
    • Component/s: client
    • Labels:
      None

      Description

      It would be nice to have a script that checks if hcat_server is running, whether it's reachable and if it's "healthy". The current definition of "healthy" implies:
      1. HCatalog responds to requests to list tables/databases.
      2. Tables can be created/dropped in HCatalog.
      3. When managed tables are used, table-directories are created/deleted on the DFS.
      4. (Future ActiveMQ events are being posted correctly, when tables/partitions are created/dropped.

      Monitoring apps might use this script to detect when HCatalog is down.

      I'll post a trivial script that does 1-3, shortly.

      1. HCATALOG-389.patch
        5 kB
        Mithun Radhakrishnan

        Activity

        Hide
        Vandana Ayyalasomayajula added a comment -

        Committed the patch to the 0.4 branch.

        Show
        Vandana Ayyalasomayajula added a comment - Committed the patch to the 0.4 branch.
        Hide
        Sushanth Sowmyan added a comment -

        +1, committed. Thanks!

        Show
        Sushanth Sowmyan added a comment - +1, committed. Thanks!
        Hide
        Mithun Radhakrishnan added a comment -

        I've changed the #! to /usr/bin/env, and changed the variables names.

        I've also suppressed the logging from the cleanup-function, since the messages could be confusing.

        Show
        Mithun Radhakrishnan added a comment - I've changed the #! to /usr/bin/env, and changed the variables names. I've also suppressed the logging from the cleanup-function, since the messages could be confusing.
        Hide
        Mithun Radhakrishnan added a comment -

        Incorporating Sush's suggestions.

        Show
        Mithun Radhakrishnan added a comment - Incorporating Sush's suggestions.
        Hide
        Sushanth Sowmyan added a comment -

        Also, thinking aloud - now that the tool is called hcat_check instead of HCAT_PING_DATABASE,etc, we could make them HCAT_CHECK_DATABASE ?

        Show
        Sushanth Sowmyan added a comment - Also, thinking aloud - now that the tool is called hcat_check instead of HCAT_PING_DATABASE,etc, we could make them HCAT_CHECK_DATABASE ?
        Hide
        Sushanth Sowmyan added a comment -

        After some checking, the rest looks okay. I'm +1 on this with that one env path change. If you update the patch, I can commit. (Or should I just go ahead with that on my end?)

        Show
        Sushanth Sowmyan added a comment - After some checking, the rest looks okay. I'm +1 on this with that one env path change. If you update the patch, I can commit. (Or should I just go ahead with that on my end?)
        Hide
        Sushanth Sowmyan added a comment -

        Testing, but a quick kinda funny(from a "Quis custodiet ipsos custodes?" standpoint) note - it seems to be /usr/bin/env on a couple of systems I checked, not /bin/env.

        Show
        Sushanth Sowmyan added a comment - Testing, but a quick kinda funny(from a "Quis custodiet ipsos custodes?" standpoint) note - it seems to be /usr/bin/env on a couple of systems I checked, not /bin/env.
        Hide
        Mithun Radhakrishnan added a comment -

        Hello, chaps. Does this look ready/useful enough to go in? (Sushanth/Alan?)

        Show
        Mithun Radhakrishnan added a comment - Hello, chaps. Does this look ready/useful enough to go in? (Sushanth/Alan?)
        Hide
        Mithun Radhakrishnan added a comment -

        Improved version, including comments from Travis, Ashutosh.

        Show
        Mithun Radhakrishnan added a comment - Improved version, including comments from Travis, Ashutosh.
        Hide
        Mithun Radhakrishnan added a comment -

        I've an improved version.

        Show
        Mithun Radhakrishnan added a comment - I've an improved version.
        Hide
        Mithun Radhakrishnan added a comment -

        The second drop database is done as part of cleanup. It's for cases where there is an unforeseen exception. The return code from this operation can be safely ignored.

        Your second point is something that occurred to us yesterday as well. We might want to support using the hcat or hive cli to run the same. I'll introduce a variable and defaults.

        I'm working on Travis's comments as well.

        Show
        Mithun Radhakrishnan added a comment - The second drop database is done as part of cleanup. It's for cases where there is an unforeseen exception. The return code from this operation can be safely ignored. Your second point is something that occurred to us yesterday as well. We might want to support using the hcat or hive cli to run the same. I'll introduce a variable and defaults. I'm working on Travis's comments as well.
        Hide
        Ashutosh Chauhan added a comment -

        I did bit more testing and has couple of comments:

        • Looks like you are trying to drop database twice. As a result, second time it fails, but by that time you already have said Server is running fine. Looks like a bug.
        • It's assumed that hcat and hadoop is in path, it will be good to support HCAT_COMMAND and HADOOP_COMMAND environment variables for those and if not defined, then assume to be in path. Could be done in a follow-up jira though.
        Show
        Ashutosh Chauhan added a comment - I did bit more testing and has couple of comments: Looks like you are trying to drop database twice. As a result, second time it fails, but by that time you already have said Server is running fine . Looks like a bug. It's assumed that hcat and hadoop is in path, it will be good to support HCAT_COMMAND and HADOOP_COMMAND environment variables for those and if not defined, then assume to be in path. Could be done in a follow-up jira though.
        Hide
        Ashutosh Chauhan added a comment -

        This should live in scripts/hcat_check instead of bin/hcat_check. Other then that looks good to me. Agree with Travis comments as well.

        Show
        Ashutosh Chauhan added a comment - This should live in scripts/hcat_check instead of bin/hcat_check. Other then that looks good to me. Agree with Travis comments as well.
        Hide
        Travis Crawford added a comment -

        In general this looks good! A few comments/suggestions:

        OPTION PARSING:

        Depending on the target python version, perhaps use a more powerful option paring library? Then you don't need to keep the help in sync like "print_usage" does; and I find them easier to work with / easier to read.

        http://docs.python.org/library/optparse.html#module-optparse
        http://docs.python.org/library/argparse.html#module-argparse

        LOGGING:

        Perhaps use a file logger, and have a flag to enable stdout logging too? I find this very useful because if someone puts this in cron you'd want a record of past invocations to see when it started failing. When run with Nagios it could log to stdout. It might look something like:

          logger = logging.getLogger()
          if options.verbose:
            logger.setLevel(logging.DEBUG)
          else:
            logger.setLevel(logging.INFO)
          formatter = logging.Formatter("%(asctime)s %(filename)s:%(lineno)d - %(message)s")
          if options.log_file:
            file_handler = logging.handlers.RotatingFileHandler(options.log_file,
              maxBytes=10*1024*1024, backupCount=3)
            file_handler.setFormatter(formatter)
            logger.addHandler(file_handler)
          else:
            stream_handler = logging.StreamHandler()
            stream_handler.setFormatter(formatter)
            logger.addHandler(stream_handler)
        

        Then "verbose_log" is not needed. Just logger.verbose or logger.info

        ERROR MESSAGES:

        Currently the check is a big list of booleans and if any fail it says there's an error. Perhaps have a list of checks and keep track of which ones failed, then print that out at the end? That would provide more actionable info.

        Show
        Travis Crawford added a comment - In general this looks good! A few comments/suggestions: OPTION PARSING: Depending on the target python version, perhaps use a more powerful option paring library? Then you don't need to keep the help in sync like "print_usage" does; and I find them easier to work with / easier to read. http://docs.python.org/library/optparse.html#module-optparse http://docs.python.org/library/argparse.html#module-argparse LOGGING: Perhaps use a file logger, and have a flag to enable stdout logging too? I find this very useful because if someone puts this in cron you'd want a record of past invocations to see when it started failing. When run with Nagios it could log to stdout. It might look something like: logger = logging.getLogger() if options.verbose: logger.setLevel(logging.DEBUG) else : logger.setLevel(logging.INFO) formatter = logging.Formatter( "%(asctime)s %(filename)s:%(lineno)d - %(message)s" ) if options.log_file: file_handler = logging.handlers.RotatingFileHandler(options.log_file, maxBytes=10*1024*1024, backupCount=3) file_handler.setFormatter(formatter) logger.addHandler(file_handler) else : stream_handler = logging.StreamHandler() stream_handler.setFormatter(formatter) logger.addHandler(stream_handler) Then "verbose_log" is not needed. Just logger.verbose or logger.info ERROR MESSAGES: Currently the check is a big list of booleans and if any fail it says there's an error. Perhaps have a list of checks and keep track of which ones failed, then print that out at the end? That would provide more actionable info.
        Hide
        Mithun Radhakrishnan added a comment -

        David (Capwell) suggested that we throw in checks for connectivity to HBase, ActiveMQ (or equivalent), etc. This makes sense to add, albeit later on.

        Show
        Mithun Radhakrishnan added a comment - David (Capwell) suggested that we throw in checks for connectivity to HBase, ActiveMQ (or equivalent), etc. This makes sense to add, albeit later on.
        Hide
        Mithun Radhakrishnan added a comment -

        Updated as per Nagios plugin-guidelines (Thanks, Travis). Renamed to hcat_check (Thanks, Sush).

        Show
        Mithun Radhakrishnan added a comment - Updated as per Nagios plugin-guidelines (Thanks, Travis). Renamed to hcat_check (Thanks, Sush).
        Hide
        Mithun Radhakrishnan added a comment -

        @Travis: Thanks for the nagios plugin-guideline-link. I'll take this into account.

        I was trying to keep hcat_ping as thin as possible, with no output save the error-code. It looks like there might be value in writing out a message, for alert.

        Show
        Mithun Radhakrishnan added a comment - @Travis: Thanks for the nagios plugin-guideline-link. I'll take this into account. I was trying to keep hcat_ping as thin as possible, with no output save the error-code. It looks like there might be value in writing out a message, for alert.
        Hide
        Travis Crawford added a comment -

        Nagios is an alerting tool many sites use that defines a pretty simple plugin system for external checks.

        Summary:

        • Return code indicates if the service is ok, warning, critical or unknown.
        • Verbosity levels since the output may end up as a text message, email, etc.

        Most of this stuff is pretty straightforward, and not everything needs to be implemented. If this sounds interesting more details are available at http://nagiosplug.sourceforge.net/developer-guidelines.html

        Show
        Travis Crawford added a comment - Nagios is an alerting tool many sites use that defines a pretty simple plugin system for external checks. Summary: Return code indicates if the service is ok, warning, critical or unknown. Verbosity levels since the output may end up as a text message, email, etc. Most of this stuff is pretty straightforward, and not everything needs to be implemented. If this sounds interesting more details are available at http://nagiosplug.sourceforge.net/developer-guidelines.html
        Hide
        Mithun Radhakrishnan added a comment -

        Sushanth suggests that this be called hcat_check instead (since "hcat_ping" implies a const-operation, while this does create/drop objects in the metastore).

        Show
        Mithun Radhakrishnan added a comment - Sushanth suggests that this be called hcat_check instead (since "hcat_ping" implies a const-operation, while this does create/drop objects in the metastore).
        Hide
        Mithun Radhakrishnan added a comment -

        Corrected typo. Retested.

        Show
        Mithun Radhakrishnan added a comment - Corrected typo. Retested.
        Hide
        Mithun Radhakrishnan added a comment -

        And now, as a patch.

        Show
        Mithun Radhakrishnan added a comment - And now, as a patch.
        Hide
        Mithun Radhakrishnan added a comment -

        The previous version depended on there being a default-db. This one doesn't.

        Show
        Mithun Radhakrishnan added a comment - The previous version depended on there being a default-db. This one doesn't.
        Hide
        Mithun Radhakrishnan added a comment -

        In python.

        Show
        Mithun Radhakrishnan added a comment - In python.

          People

          • Assignee:
            Mithun Radhakrishnan
            Reporter:
            Mithun Radhakrishnan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development