Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24294

TezSessionPool sessions can throw AssertionError

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0
    • Component/s: None

      Description

      Whenever default TezSessionPool sessions are reopened for some reason, we are setting dagResources to null before close & setting it back in openWhenever default TezSessionPool sessions are reopened for some reason, we are setting dagResources to null before close & setting it back in open
      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L498-L503
      If there is an exception in sessionState.close(), we are not restoring the dagResource but moving the session back to TezSessionPool.eg., exception trace when sessionState.close() failed

      2020-10-15T09:20:28,749 INFO  [HiveServer2-Background-Pool: Thread-25451]: client.TezClient (:()) - Failed to shutdown Tez Session via proxy
      org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1602093123456_12345, yarnApplicationState=FINISHED, finalApplicationStatus=SUCCEEDED, trackingUrl=http://localhost:8088/proxy/application_1602093123456_12345/, diagnostics=Session timed out, lastDAGCompletionTime=1602997683786 ms, sessionTimeoutInterval=600000 ms
      Session stats:submittedDAGs=2, successfulDAGs=2, failedDAGs=0, killedDAGs=0        at org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:910) 
              at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:1060) 
              at org.apache.tez.client.TezClient.stop(TezClient.java:743) 
              at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.closeClient(TezSessionState.java:789) 
              at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:756) 
              at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.close(TezSessionPoolSession.java:111) 
              at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopenInternal(TezSessionPoolManager.java:496) 
              at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.reopen(TezSessionPoolManager.java:487) 
              at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.reopen(TezSessionPoolSession.java:228) 
              at org.apache.hadoop.hive.ql.exec.tez.TezTask.getNewTezSessionOnError(TezTask.java:531) 
              at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:546) 
              at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:221)

      Because of this, all new queries using this corrupted sessions are failing with below exception

      Caused by: java.lang.AssertionError: Ensure called on an unitialized (or closed) session 41774265-b7da-4d58-84a8-1bedfd597aecCaused by: java.lang.AssertionError: Ensure called on an unitialized (or closed) session 41774265-b7da-4d58-84a8-1bedfd597aec at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:685)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                nareshpr Naresh P R
                Reporter:
                nareshpr Naresh P R
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m