HCatalog
  1. HCatalog
  2. HCATALOG-541

The meta store client throws TimeOut exception if ~1000 clients are trying to call listPartition on the server

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Hadoop 0.23.4
      Hcatalog 0.4
      Oracle

      Description

      Error on the client:

      2012-10-24 21:44:03,942 INFO [pool-12-thread-2] org.apache.hcatalog.hcatmix.load.tasks.Task: Error listing partitions
      org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:345)
              at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:422)
              at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:404)
              at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
              at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
              at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
              at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
              at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions(ThriftHiveMetastore.java:1208)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions(ThriftHiveMetastore.java:1193)
              at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:631)
              at org.apache.hcatalog.hcatmix.load.tasks.HCatListPartitionTask.doTask(HCatListPartitionTask.java:45)
              at org.apache.hcatalog.hcatmix.load.TaskExecutor.call(TaskExecutor.java:79)
              at org.apache.hcatalog.hcatmix.load.TaskExecutor.call(TaskExecutor.java:39)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)
      Caused by: java.net.SocketTimeoutException: Read timed out
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.read(SocketInputStream.java:129)         at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
      

      Error on the server:

      Exception in thread "pool-1-thread-3206" java.lang.OutOfMemoryError: unable to create new native thread
              at java.lang.Thread.start0(Native Method)
              at java.lang.Thread.start(Thread.java:597)
              at org.datanucleus.store.query.Query.performExecuteTask(Query.java:1891)
              at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:613)
              at org.datanucleus.store.query.Query.executeQuery(Query.java:1692)
              at org.datanucleus.store.query.Query.executeWithArray(Query.java:1527)
              at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:266)
              at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitions(ObjectStore.java:1521)
              at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1268)
              at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
              at $Proxy7.getPartitions(Unknown Source)
              at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1468)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:5318)
              at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:5306)
              at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
              at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
              at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:555)
              at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:552)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:396)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
              at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:552)
              at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:176)
              at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
              at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)
      
      

      The graph for concurrent usage of list partition can be seen here:

      https://cwiki.apache.org/confluence/download/attachments/30740331/hcatmix_list_partition_loadtest_25min.html

      The table has 2000 partitions.

        Activity

        Hide
        Manish Malhotra added a comment -

        Hi Travis and Arup,

        I'm also facing similar problem while using Hive Thrift Server but without HCatalog.
        But I didnt see OOM error in the thrift server logs.

        Pattern is mostly when the load on the Hive thrift server is high ( mostly when most of the Hive ETL jobs are running) some time it start getting into the mode where it doesnt respond in time and throws Socket Timeout.

        And this happens for different operations and not only for list partitions.

        Please update, if there is any update on this ticket, that might help my situation as well.

        Regards,
        Manish

        Stack Trace:

        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
        at $Proxy7.getDatabase(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
        at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
        at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
        Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 34 more
        2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401)) - FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
        at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
        at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)

        Show
        Manish Malhotra added a comment - Hi Travis and Arup, I'm also facing similar problem while using Hive Thrift Server but without HCatalog. But I didnt see OOM error in the thrift server logs. Pattern is mostly when the load on the Hive thrift server is high ( mostly when most of the Hive ETL jobs are running) some time it start getting into the mode where it doesnt respond in time and throws Socket Timeout. And this happens for different operations and not only for list partitions. Please update, if there is any update on this ticket, that might help my situation as well. Regards, Manish Stack Trace: at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74) at $Proxy7.getDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 34 more 2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401)) - FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        Hide
        Travis Crawford added a comment -

        Good find, and thanks for writing the load tester! I haven't seen this specific errors, but do occasionally see queries entirely fail when a metastore RPC fails. Often this wastes many hours of cluster slot time for a very minor issue.

        Adding retry support (with backoffs) to the HiveMetaStoreClient would be a very useful addition.

        Show
        Travis Crawford added a comment - Good find, and thanks for writing the load tester! I haven't seen this specific errors, but do occasionally see queries entirely fail when a metastore RPC fails. Often this wastes many hours of cluster slot time for a very minor issue. Adding retry support (with backoffs) to the HiveMetaStoreClient would be a very useful addition.

          People

          • Assignee:
            Unassigned
            Reporter:
            Arup Malakar
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development