Hive
  1. Hive
  2. HIVE-1539

Concurrent metastore threading problem

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: None
    • Component/s: Metastore
    • Labels:
      None

      Description

      When running hive as a service and running a high number of queries concurrently I end up with multiple threads running at 100% cpu without any progress.

      Looking at these threads I notice this thread(484e):
      at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598)

      But on a different thread(63a2):
      at org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java)

      1. thread_dump_hanging.txt
        30 kB
        Bennie Schut
      2. ClassLoaderResolver.patch
        26 kB
        Bennie Schut
      3. HIVE-1539.patch
        0.5 kB
        Bennie Schut
      4. HIVE-1539-1.patch
        3 kB
        Bennie Schut

        Issue Links

          Activity

          Hide
          Bennie Schut added a comment -

          Are we getting errors like these on HIVE-1862 ? :
          [junit] Exception: java.lang.RuntimeException: The table default_show_idx_full_idx_comment_ is an index table. Please do drop index instead.
          [junit] org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: The table default_show_idx_full_idx_comment_ is an index table. Please do drop index instead.

          Or is this something else?

          Show
          Bennie Schut added a comment - Are we getting errors like these on HIVE-1862 ? : [junit] Exception: java.lang.RuntimeException: The table default_ show_idx_full_idx_comment _ is an index table. Please do drop index instead. [junit] org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: The table default_ show_idx_full_idx_comment _ is an index table. Please do drop index instead. Or is this something else?
          Hide
          Bennie Schut added a comment -

          Thanks Ashutosh, I was waiting for the test result and as you predicted just got a build failed.

          Show
          Bennie Schut added a comment - Thanks Ashutosh, I was waiting for the test result and as you predicted just got a build failed.
          Hide
          Ashutosh Chauhan added a comment -

          @Bennie,

          Since this bug was filed, we upgraded from 2.0.3 to 2.1.1 (in HIVE-1609) Some bugs then showed up which were attributed to that jdo lib upgrade (HIVE-1853) so it was rolled back to 2.0.3 Investigation is still on for them in HIVE-1862 So, before committing to still higher version of jdo 2.2.0 we need to make sure it doesn't result in test case failure listed in HIVE-1862.

          Show
          Ashutosh Chauhan added a comment - @Bennie, Since this bug was filed, we upgraded from 2.0.3 to 2.1.1 (in HIVE-1609 ) Some bugs then showed up which were attributed to that jdo lib upgrade ( HIVE-1853 ) so it was rolled back to 2.0.3 Investigation is still on for them in HIVE-1862 So, before committing to still higher version of jdo 2.2.0 we need to make sure it doesn't result in test case failure listed in HIVE-1862 .
          Hide
          Bennie Schut added a comment -

          patch to go from datanucleus 2.0 to 2.2.0-release. I had to make a small change to the Utilities class since some package names changed. I also had to add some extra excludes on the metastore/ivy.xml

          Show
          Bennie Schut added a comment - patch to go from datanucleus 2.0 to 2.2.0-release. I had to make a small change to the Utilities class since some package names changed. I also had to add some extra excludes on the metastore/ivy.xml
          Hide
          John Sichi added a comment -

          According to the datanucleus website, 2.2.0 is scheduled for 10-Dec, so let's wait for it.

          Show
          John Sichi added a comment - According to the datanucleus website, 2.2.0 is scheduled for 10-Dec, so let's wait for it.
          Hide
          Bennie Schut added a comment -

          2.2.0-m2 can now be found on maven so the previous patch can be discarded and you can simple update the ivy/libraries/properties file to use datanucleus 2.2.0-m2.

          since -m2 is a milestone release I'm not sure if we would want to include this or wait for 2.2.0 release.

          Show
          Bennie Schut added a comment - 2.2.0-m2 can now be found on maven so the previous patch can be discarded and you can simple update the ivy/libraries/properties file to use datanucleus 2.2.0-m2. since -m2 is a milestone release I'm not sure if we would want to include this or wait for 2.2.0 release.
          Hide
          Bennie Schut added a comment -

          Patch got comitted on datanucleus 2.2.0.m2
          We could consider moving from 2.0 to 2.2
          The temporary fix of loading your own classloader also wroked nicely.

          Show
          Bennie Schut added a comment - Patch got comitted on datanucleus 2.2.0.m2 We could consider moving from 2.0 to 2.2 The temporary fix of loading your own classloader also wroked nicely.
          Hide
          Bennie Schut added a comment -

          Ok still testing it but this is a temporary fix we add our own sync. version of the classloader:

          Just make sure you add these properties and it should work:
          <property>
          <name>datanucleus.classLoaderResolverName</name>
          <value>syncloader</value>
          </property>

          Show
          Bennie Schut added a comment - Ok still testing it but this is a temporary fix we add our own sync. version of the classloader: Just make sure you add these properties and it should work: <property> <name>datanucleus.classLoaderResolverName</name> <value>syncloader</value> </property>
          Hide
          Bennie Schut added a comment -

          JDOClassLoaderResolver doesn't seem thread safe. That's a bit of a surprise. I filed a bug with datanucleus:
          http://www.datanucleus.org/servlet/jira/browse/NUCCORE-559
          I just made my own threadsafe version of the JDOClassLoaderResolver and am loading it to see if that fixes it. Will probably take a few days to be sure it got fixed.

          Show
          Bennie Schut added a comment - JDOClassLoaderResolver doesn't seem thread safe. That's a bit of a surprise. I filed a bug with datanucleus: http://www.datanucleus.org/servlet/jira/browse/NUCCORE-559 I just made my own threadsafe version of the JDOClassLoaderResolver and am loading it to see if that fixes it. Will probably take a few days to be sure it got fixed.
          Hide
          Bennie Schut added a comment -

          Ok, I just had another few instances of hanging threads. Last time I used a SynchronizedFactory to eliminate any threading issues in the metastore client but that doesn't solve it like Paul correctly saw.

          "pool-1-thread-224" prio=10 tid=0x00007f717d62c800 nid=0x7b4d runnable [0x000000004557c000]
          java.lang.Thread.State: RUNNABLE
          at java.util.HashMap.get(HashMap.java:303)
          at org.datanucleus.util.ReferenceValueMap.get(ReferenceValueMap.java:186)
          at org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:185)
          at org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:415)
          at org.datanucleus.store.mapped.MappedTypeManager.isSupportedMappedType(MappedTypeManager.java:84)
          at org.datanucleus.store.rdbms.query.legacy.AbstractIteratorStatement.<init>(AbstractIteratorStatement.java:83)
          at org.datanucleus.store.rdbms.query.legacy.UnionIteratorStatement.<init>(UnionIteratorStatement.java:147)
          at org.datanucleus.store.rdbms.query.legacy.ClassTableExtent.newQueryStatement(ClassTableExtent.java:204)
          at org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:323)
          at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225)
          at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175)
          at org.datanucleus.store.query.Query.executeQuery(Query.java:1628)
          at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
          at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
          at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
          at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:323)
          at org.apache.hadoop.hive.metastore.ObjectStore.convertToMTable(ObjectStore.java:640)
          at org.apache.hadoop.hive.metastore.ObjectStore.alterTable(ObjectStore.java:946)
          at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:177)
          at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$23.run(HiveMetaStore.java:1329)
          at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$23.run(HiveMetaStore.java:1326)
          at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:229)
          at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1326)
          at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:143)
          at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.hadoop.hive.common.SynchronizedFactory$Handler.invoke(SynchronizedFactory.java:46)

          • locked <0x00000007ffc6fd00> (a org.apache.hadoop.hive.metastore.HiveMetaStoreClient)
            at $Proxy8.alter_table(Unknown Source)
            at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:267)
            at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:875)
            at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:174)
            at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
            at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
            at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:609)
            at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:478)
            at org.apache.hadoop.hive.ql.Driver.run(Driver.java:356)
            at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:114)
            at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378)
            at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366)
            at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
            at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
            at java.lang.Thread.run(Thread.java:619)
          Show
          Bennie Schut added a comment - Ok, I just had another few instances of hanging threads. Last time I used a SynchronizedFactory to eliminate any threading issues in the metastore client but that doesn't solve it like Paul correctly saw. "pool-1-thread-224" prio=10 tid=0x00007f717d62c800 nid=0x7b4d runnable [0x000000004557c000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.get(HashMap.java:303) at org.datanucleus.util.ReferenceValueMap.get(ReferenceValueMap.java:186) at org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:185) at org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:415) at org.datanucleus.store.mapped.MappedTypeManager.isSupportedMappedType(MappedTypeManager.java:84) at org.datanucleus.store.rdbms.query.legacy.AbstractIteratorStatement.<init>(AbstractIteratorStatement.java:83) at org.datanucleus.store.rdbms.query.legacy.UnionIteratorStatement.<init>(UnionIteratorStatement.java:147) at org.datanucleus.store.rdbms.query.legacy.ClassTableExtent.newQueryStatement(ClassTableExtent.java:204) at org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:323) at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175) at org.datanucleus.store.query.Query.executeQuery(Query.java:1628) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499) at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243) at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:323) at org.apache.hadoop.hive.metastore.ObjectStore.convertToMTable(ObjectStore.java:640) at org.apache.hadoop.hive.metastore.ObjectStore.alterTable(ObjectStore.java:946) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:177) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$23.run(HiveMetaStore.java:1329) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$23.run(HiveMetaStore.java:1326) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:229) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1326) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:143) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.common.SynchronizedFactory$Handler.invoke(SynchronizedFactory.java:46) locked <0x00000007ffc6fd00> (a org.apache.hadoop.hive.metastore.HiveMetaStoreClient) at $Proxy8.alter_table(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:267) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:875) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:174) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:609) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:478) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:356) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:114) at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378) at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
          Hide
          Bennie Schut added a comment -

          thanks Paul, I didn't notice that yet so in theory it shouldn't be the cause. However after synchronizing the metastore client I never saw this problem again where I used to see this problem every day.
          Strange issue. I'll see if I can create something reproducible but that might be tricky for this type of problem.

          Show
          Bennie Schut added a comment - thanks Paul, I didn't notice that yet so in theory it shouldn't be the cause. However after synchronizing the metastore client I never saw this problem again where I used to see this problem every day. Strange issue. I'll see if I can create something reproducible but that might be tricky for this type of problem.
          Hide
          Paul Yang added a comment -

          From looking at the Hive class, we are creating a ThreadLocal instance of Hive when we call get(). Since the HiveMetaStoreClient is a member of Hive, shouldn't the client be accessed by only 1 thread at a time?

          Show
          Paul Yang added a comment - From looking at the Hive class, we are creating a ThreadLocal instance of Hive when we call get(). Since the HiveMetaStoreClient is a member of Hive, shouldn't the client be accessed by only 1 thread at a time?
          Hide
          Bennie Schut added a comment -

          That would be HIVE-1482 not 28

          Show
          Bennie Schut added a comment - That would be HIVE-1482 not 28
          Hide
          Bennie Schut added a comment -

          This is starting to look more and more like a similar problem we have on the jdbc driver (HIVE-1428). the HiveMetatStoreClient is doing calls on the thrift client but the thrift client isn't threadsafe (probably by design). So when the HiveMetaStoreClient is used it should be synchronized or a new HiveMetaStoreClient object should be created for each thread.

          From the "Hive" class:

          private IMetaStoreClient getMSC() throws MetaException {
          if (metaStoreClient == null)

          { metaStoreClient = createMetaStoreClient(); }

          return metaStoreClient;
          }

          I could make the sync factory we use on HIVE-1428 a common thing and reuse it here. Any objections?

          Show
          Bennie Schut added a comment - This is starting to look more and more like a similar problem we have on the jdbc driver ( HIVE-1428 ). the HiveMetatStoreClient is doing calls on the thrift client but the thrift client isn't threadsafe (probably by design). So when the HiveMetaStoreClient is used it should be synchronized or a new HiveMetaStoreClient object should be created for each thread. From the "Hive" class: private IMetaStoreClient getMSC() throws MetaException { if (metaStoreClient == null) { metaStoreClient = createMetaStoreClient(); } return metaStoreClient; } I could make the sync factory we use on HIVE-1428 a common thing and reuse it here. Any objections?
          Hide
          Bennie Schut added a comment -

          Thread dump.

          Show
          Bennie Schut added a comment - Thread dump.

            People

            • Assignee:
              Bennie Schut
              Reporter:
              Bennie Schut
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development