Hive
  1. Hive
  2. HIVE-2127

Improve stats gathering reliability by retries on failures with hive.stats.retries.max and hive.stats.retries.wait

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor, Statistics
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Stats publishing and aggregation only try once and if there is any exception it will fail and return. If many mappers/reducers updating stats at the same time, it is very common to get lock timeout. We should make stats more reliable by retry when there is an SQLException.

      1. HIVE-2127.2.patch
        30 kB
        Ning Zhang
      2. HIVE-2127.patch
        19 kB
        Ning Zhang

        Activity

        Hide
        Ning Zhang added a comment -
        Show
        Ning Zhang added a comment - Review board: https://reviews.apache.org/r/664/
        Hide
        Ning Zhang added a comment -

        Paul has an offline comment about it cannot handle Connection exceptions. I'm working on a new patch and will update it soon.

        Show
        Ning Zhang added a comment - Paul has an offline comment about it cannot handle Connection exceptions. I'm working on a new patch and will update it soon.
        Hide
        Ning Zhang added a comment -

        Updated the review board.

        Show
        Ning Zhang added a comment - Updated the review board.
        Hide
        Namit Jain added a comment -

        Comments in review-board

        Show
        Namit Jain added a comment - Comments in review-board
        Hide
        Namit Jain added a comment -

        Also add the new configuration variables in the name of the jira

        Show
        Namit Jain added a comment - Also add the new configuration variables in the name of the jira
        Hide
        Ning Zhang added a comment -

        @Namit, what does the new configuration variable do? Do you mean to define a variable to disable retry? If so set hive.stats.retries.max = 0 will do.

        Show
        Ning Zhang added a comment - @Namit, what does the new configuration variable do? Do you mean to define a variable to disable retry? If so set hive.stats.retries.max = 0 will do.
        Hide
        Namit Jain added a comment -

        What I meant was:

        Change the subject of the jira:
        Improve stats gathering reliability by retries on failures

        for better searching

        Show
        Namit Jain added a comment - What I meant was: Change the subject of the jira: Improve stats gathering reliability by retries on failures for better searching
        Hide
        Namit Jain added a comment -

        Looks good otherwise

        Show
        Namit Jain added a comment - Looks good otherwise
        Hide
        Ning Zhang added a comment -

        Changed the JIRA subject

        Show
        Ning Zhang added a comment - Changed the JIRA subject
        Hide
        Namit Jain added a comment -

        Committed. Thanks Ning

        Show
        Namit Jain added a comment - Committed. Thanks Ning

          People

          • Assignee:
            Ning Zhang
            Reporter:
            Ning Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development