Hive
  1. Hive
  2. HIVE-5924

Save operation logs in per operation directories in HiveServer2

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.13.0
    • Fix Version/s: None
    • Component/s: HiveServer2
    • Labels:
      None

      Issue Links

        Activity

        Hide
        Jaideep Dhok added a comment -

        I've created a reviewboard request - https://reviews.apache.org/r/16285/

        Show
        Jaideep Dhok added a comment - I've created a reviewboard request - https://reviews.apache.org/r/16285/
        Hide
        Jaideep Dhok added a comment -

        Vaibhav Gumashta [Prasad Mujumdar Please have a look at the patch.
        How to submit a review request? Seems that the Phabricator documentation on the wiki is bit outdated.

        Show
        Jaideep Dhok added a comment - Vaibhav Gumashta [ Prasad Mujumdar Please have a look at the patch. How to submit a review request? Seems that the Phabricator documentation on the wiki is bit outdated.
        Hide
        Jaideep Dhok added a comment -

        First version of the patch

        Show
        Jaideep Dhok added a comment - First version of the patch
        Hide
        Prasad Mujumdar added a comment -

        Jaideep Dhok Thanks for bringing this up. Certainly a useful functionality.
        The proposed directory layout (1-3) sounds reasonable.

        You might want to take a look at HIVE-4629. There's patch attached which keeps a copy of the current session's log in a mem buffer and allows that to be retrieved by the client. Most of that implementation would be useful for this work.

        These two patches combined, will enable clients to retrieve the query specific logs which would be highly useful for troubleshooting.

        Show
        Prasad Mujumdar added a comment - Jaideep Dhok Thanks for bringing this up. Certainly a useful functionality. The proposed directory layout (1-3) sounds reasonable. You might want to take a look at HIVE-4629 . There's patch attached which keeps a copy of the current session's log in a mem buffer and allows that to be retrieved by the client. Most of that implementation would be useful for this work. These two patches combined, will enable clients to retrieve the query specific logs which would be highly useful for troubleshooting.
        Hide
        Jaideep Dhok added a comment -

        we can close the session

        We will not actually close the session, just delete the log files.

        Show
        Jaideep Dhok added a comment - we can close the session We will not actually close the session, just delete the log files.
        Hide
        Jaideep Dhok added a comment -

        Vaibhav Gumashta Thanks for looking at the issue.

        1. Would enabling the per session/operation log config mean that there will be no consolidated log?

        HiveServer2 logs like session open, session close etc will continue to be consolidated. Only the query logs like job client logs, driver or task logs will be redirected. Turning off the log redirection would again consolidate everything into a single log file as is done currently.

        I'd be curious to hear what your method of detecting abandoned sessions is.

        For detecting abandoned sessions w.r.t. log purging, I can check the last modified time of an operation log file. If that is older than a configured value, we can close the session.

        Show
        Jaideep Dhok added a comment - Vaibhav Gumashta Thanks for looking at the issue. 1. Would enabling the per session/operation log config mean that there will be no consolidated log? HiveServer2 logs like session open, session close etc will continue to be consolidated. Only the query logs like job client logs, driver or task logs will be redirected. Turning off the log redirection would again consolidate everything into a single log file as is done currently. I'd be curious to hear what your method of detecting abandoned sessions is. For detecting abandoned sessions w.r.t. log purging, I can check the last modified time of an operation log file. If that is older than a configured value, we can close the session.
        Hide
        Vaibhav Gumashta added a comment -

        Thanks Jaideep Dhok. Couple of questions:
        1. Would enabling the per session/operation log config mean that there will be no consolidated log?
        2. Regarding 6.), there is an open JIRA - HIVE-5268 which has some overlap. There is also a different approach taken here HIVE-5799, which is being discussed. I'd be curious to hear what your method of detecting abandoned sessions is.

        Look forward to the patch. Thanks!

        Show
        Vaibhav Gumashta added a comment - Thanks Jaideep Dhok . Couple of questions: 1. Would enabling the per session/operation log config mean that there will be no consolidated log? 2. Regarding 6.), there is an open JIRA - HIVE-5268 which has some overlap. There is also a different approach taken here HIVE-5799 , which is being discussed. I'd be curious to hear what your method of detecting abandoned sessions is. Look forward to the patch. Thanks!
        Hide
        Jaideep Dhok added a comment -

        I am ready to put in a patch, but before that I wanted to present the approach so that I could get some feedback -
        The changes are as follows -

        1. New conf setting for location of query logs (queryLogDir), and a flag to indicate if log redirection should be enabled, the flag will be default by false.
        2. For each session there will be a directory under queryLogDir with name = session id. In the directory there will be a session.out and a session.err for session level logs
        3. Similarly, for each operation in the session there will be a directory with name = operation id under queryLogDir/sessionDir/ Each directory will further contain an operationid.err and operationid.out
        4. Changed LogHelper in SessionState.java so that all streams can be set externally. Similarly the getters can check if an instance stream (for out or error) is set and return that instead of returning the System.out and System.err streams. Only if the instance streams are not set, it will return the System streams.
        5. Pass LogHelper objects created in the operation to Driver and further down to Tasks, so that output of Tasks and child processes can be redirected back. Currently this is done only for SQLOperation
        6. Query purger executor that periodically checks if the session has been closed for sufficient duration, and delete log files.
        Show
        Jaideep Dhok added a comment - I am ready to put in a patch, but before that I wanted to present the approach so that I could get some feedback - The changes are as follows - New conf setting for location of query logs (queryLogDir), and a flag to indicate if log redirection should be enabled, the flag will be default by false. For each session there will be a directory under queryLogDir with name = session id. In the directory there will be a session.out and a session.err for session level logs Similarly, for each operation in the session there will be a directory with name = operation id under queryLogDir/sessionDir/ Each directory will further contain an operationid.err and operationid.out Changed LogHelper in SessionState.java so that all streams can be set externally. Similarly the getters can check if an instance stream (for out or error) is set and return that instead of returning the System.out and System.err streams. Only if the instance streams are not set, it will return the System streams. Pass LogHelper objects created in the operation to Driver and further down to Tasks, so that output of Tasks and child processes can be redirected back. Currently this is done only for SQLOperation Query purger executor that periodically checks if the session has been closed for sufficient duration, and delete log files.
        Hide
        Jaideep Dhok added a comment -

        Currently in HiveServer2, all query logs are mixed with the HiveServer2 process' stdout and stderr. It would be useful if logs are stored isolated at per session and per operation level. This would help users in quickly retrieving their query logs for debugging.

        Directory structure similar to the one used by Hadoop JT - job_id/task_it/task.out|err can be used here -

        HS2_LOG_DIR/session_id/operation_id/operation.out|err

        Show
        Jaideep Dhok added a comment - Currently in HiveServer2, all query logs are mixed with the HiveServer2 process' stdout and stderr. It would be useful if logs are stored isolated at per session and per operation level. This would help users in quickly retrieving their query logs for debugging. Directory structure similar to the one used by Hadoop JT - job_id/task_it/task.out|err can be used here - HS2_LOG_DIR/session_id/operation_id/operation.out|err

          People

          • Assignee:
            Jaideep Dhok
            Reporter:
            Jaideep Dhok
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development