Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9665

Parallel move task optimization causes race condition

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      The change in HIVE-8042 doesn't actually work. Running it at scale produces race conditions which lead to broken thrift messages and OOMs. E.g.:

      java.lang.OutOfMemoryError: Java heap space
      	at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
      	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
      	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
      	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1122)
      	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1108)
      	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1091)
      	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:131)
      	at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
      	at com.sun.proxy.$Proxy9.getTable(Unknown Source)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1064)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1019)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1006)
      	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:250)
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
      java.lang.OutOfMemoryError: Java heap space
      	at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
      	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
      	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
      	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1122)
      	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1108)
      	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1091)
      	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:131)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
      	at com.sun.proxy.$Proxy9.getTable(Unknown Source)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1064)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1019)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1006)
      	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:250)
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
      java.lang.OutOfMemoryError: Java heap space
      	at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
      	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
      	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
      	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1122)
      	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1108)
      	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1091)
      	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:131)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
      	at com.sun.proxy.$Proxy9.getTable(Unknown Source)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1064)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1019)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1006)
      	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:250)
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
      
      1. HIVE-9665.1.patch
        1 kB
        Gunther Hagleitner

        Issue Links

          Activity

          Hide
          thejas Thejas M Nair added a comment -

          +1

          There are two issues that we need to fix before MoveTask can be used in parallel.

          1. A thread local Hive object should be used. Adding a "db = Hive.get(conf);" in MoveTask.execute will fix that
          2. SessionState is examined by acid code in MoveTask, if it is in a different thread that thread will not have the SessionState object available.
          Show
          thejas Thejas M Nair added a comment - +1 There are two issues that we need to fix before MoveTask can be used in parallel. A thread local Hive object should be used. Adding a "db = Hive.get(conf);" in MoveTask.execute will fix that SessionState is examined by acid code in MoveTask, if it is in a different thread that thread will not have the SessionState object available.
          Hide
          hiveqa Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12698333/HIVE-9665.1.patch

          SUCCESS: +1 7542 tests passed

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2783/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2783/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2783/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12698333 - PreCommit-HIVE-TRUNK-Build

          Show
          hiveqa Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698333/HIVE-9665.1.patch SUCCESS: +1 7542 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2783/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2783/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2783/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12698333 - PreCommit-HIVE-TRUNK-Build
          Hide
          hagleitn Gunther Hagleitner added a comment -

          Committed to trunk. Thanks Thejas M Nair.

          Show
          hagleitn Gunther Hagleitner added a comment - Committed to trunk. Thanks Thejas M Nair .
          Hide
          sushanth Sushanth Sowmyan added a comment -

          This issue has been fixed and released as part of the 1.2.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          sushanth Sushanth Sowmyan added a comment - This issue has been fixed and released as part of the 1.2.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

            People

            • Assignee:
              hagleitn Gunther Hagleitner
              Reporter:
              hagleitn Gunther Hagleitner
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development