HCatalog
  1. HCatalog
  2. HCATALOG-541

The meta store client throws TimeOut exception if ~1000 clients are trying to call listPartition on the server

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Hadoop 0.23.4
      Hcatalog 0.4
      Oracle

      Description

      Error on the client:

      2012-10-24 21:44:03,942 INFO [pool-12-thread-2] org.apache.hcatalog.hcatmix.load.tasks.Task: Error listing partitions
      org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:345)
              at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:422)
              at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:404)
              at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
              at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
              at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
              at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
              at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions(ThriftHiveMetastore.java:1208)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions(ThriftHiveMetastore.java:1193)
              at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:631)
              at org.apache.hcatalog.hcatmix.load.tasks.HCatListPartitionTask.doTask(HCatListPartitionTask.java:45)
              at org.apache.hcatalog.hcatmix.load.TaskExecutor.call(TaskExecutor.java:79)
              at org.apache.hcatalog.hcatmix.load.TaskExecutor.call(TaskExecutor.java:39)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)
      Caused by: java.net.SocketTimeoutException: Read timed out
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.read(SocketInputStream.java:129)         at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
      

      Error on the server:

      Exception in thread "pool-1-thread-3206" java.lang.OutOfMemoryError: unable to create new native thread
              at java.lang.Thread.start0(Native Method)
              at java.lang.Thread.start(Thread.java:597)
              at org.datanucleus.store.query.Query.performExecuteTask(Query.java:1891)
              at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:613)
              at org.datanucleus.store.query.Query.executeQuery(Query.java:1692)
              at org.datanucleus.store.query.Query.executeWithArray(Query.java:1527)
              at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:266)
              at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitions(ObjectStore.java:1521)
              at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1268)
              at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
              at $Proxy7.getPartitions(Unknown Source)
              at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1468)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:5318)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:5306)
              at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
              at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
              at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:555)
              at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:552)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:396)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
              at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:552)
              at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:176)
              at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
              at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)
      
      

      The graph for concurrent usage of list partition can be seen here:

      https://cwiki.apache.org/confluence/download/attachments/30740331/hcatmix_list_partition_loadtest_25min.html

      The table has 2000 partitions.

        Activity

        Hide
        Travis Crawford added a comment -

        Good find, and thanks for writing the load tester! I haven't seen this specific errors, but do occasionally see queries entirely fail when a metastore RPC fails. Often this wastes many hours of cluster slot time for a very minor issue.

        Adding retry support (with backoffs) to the HiveMetaStoreClient would be a very useful addition.

        Show
        Travis Crawford added a comment - Good find, and thanks for writing the load tester! I haven't seen this specific errors, but do occasionally see queries entirely fail when a metastore RPC fails. Often this wastes many hours of cluster slot time for a very minor issue. Adding retry support (with backoffs) to the HiveMetaStoreClient would be a very useful addition.

          People

          • Assignee:
            Unassigned
            Reporter:
            Arup Malakar
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development