Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9469

Hive Thrift Server throws Socket Timeout Exception: Read time out

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: Metastore
    • Labels:
      None
    • Environment:

      4 core cpu, 15gb memory. 2 thrift server behind load balancer

      Description

      Hi All,

      Please review the following problem, I also posted same in the hive-user group, but didnt got any response yet.
      This is happening quite frequently in our environment.
      So, it would be great if somebody can see and advise.

      I'm using Hive Thrift Server in Production which at peak handles around 500 req/min.
      After certain point the Hive Thrift Server is going into the no response mode and throws
      Following exception
      "org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out"

      As the metastore we are using MySQL, that is being used by Thrift server.
      The design / architecture is like this:

      Oozie – > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> MySQL (Master) – > MySQL (Slave).

      Software versions:

      Hive version : 0.10.0
      Hadoop: 1.2.1

      Looks like when the load is beyond some threshold for certain operations it is having problem in responding.
      As the hive jobs sometimes fails because of this issue, we also have a auto-restart check to see if the Thrift server is not responding, it stops / kills and restart the service.

      Other tuning done:

      Thrift Server:

      Given 11gb heap, and configured CMS GC algo.

      MySQL:

      Tuned innodb_buffer, tmp_table and max_heap parameters.

      So, can somebody please help to understand, what could be the root cause for this or somebody faced the similar issue.

      I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541

      But this JIRA shows that Hive Thrift Server shows OOM error, but in my case I didnt see any OOM error in my case.

      Regards,
      Manish

      Full Exception Stack:

      at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
      at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
      at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
      at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
      at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
      at $Proxy7.getDatabase(Unknown Source)
      at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
      at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
      at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
      at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
      at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
      at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
      at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
      at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
      Caused by: java.net.SocketTimeoutException: Read timed out
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.read(SocketInputStream.java:150)
      at java.net.SocketInputStream.read(SocketInputStream.java:121)
      at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
      at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
      at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
      ... 34 more
      2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401)) - FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
      org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
      at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
      at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
      at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
      at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
      at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
      at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
      at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
      at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)

      1. Before_JMV_Profiling_Tuning.jpg
        80 kB
        Manish Malhotra
      2. After_JMV_Profiling_Tuning.jpg
        86 kB
        Manish Malhotra

        Activity

        Hide
        manish.hadoop.work@gmail.com Manish Malhotra added a comment -

        Continue on the similar thread.
        Following are the work, I did for the load testing and fixing some of the issues.
        It will be great, if somebody can review this and see, if there are things which Im missing.
        And some time still I see the SocketTimeoutException, but ETL jobs are not failing.

        Currently running load test with following commands / load using Hive Client APIs.

        a. Create Partition - 10 threads
        b. ListPartition - 30 threads
        c. Show tables – 100 threads

        Load on the server was around 1200 Request Per Minute.
        and for this test Thirft Server + MySQL looks good.

        The tuning and finding are:

        Thrift Server

        1. JVM tuning : (JVM profiling shows with default settings, there were too frequent Full GC happening)

        Young Generation GC Algo: Parallel

        Old Generation GC Algo: CMS

        Max_Heap: 11 Gb

        SurvivorRatio : 6

        Graph before optimization:

        – Attaching as separate files.

        Graph after optimization:

        – Attaching as separate files.

        2. Database Connection Pooling:

        Thrift Server uses DataNucleus framework for DB operations.
        And it uses DBCP as the connection pooling tool, the default config for DBCP is maxConnections = 10.
        Changed it to 30.

        As that is the basic bottleneck to server more requests.

        Database:

        1. innodb_buffer = 8gb and tmp_table_space, max_heap_space = 256 mb.

        The other problem I unearthed was that one of the hive-table that had more 1 million rows, and in PROD the ListPartition was happening on this table, when this happened it takes a lot of time to get the response from DB as there are too many rows in the PARTITION table.
        So, it started blocking threads in Thrift Server and keep using one of the DB Connection and eventually got into state where all the DB Connections are used and new request cannot get the DB Connection and started getting.
        This problem was eventually making our hive queries failing and restarting.

        When solved this problem the throughput of the Thrift Server has increased a lot. And The failure of Hive Jobs has reduced a lot.

        So, please let me know if these changes and solving the ListPartion problem for big table is good or there are few other things which we should take care.

        Regards,
        Manish

        --------------------------------------------------------- Following are the details of the PROD Infrastructure ---------------------------------------------------------

        Load = 500 req/min.

        Exception: "org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out"

        As the metastore we are using MySQL, that is being used by Thrift server.
        The flow is like this:

        Oozie – > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> MySQL (Master) – > MySQL (Slave).

        Software versions:

        Hive version : 0.10.0
        Hadoop: 1.2.1

        I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541

        But this JIRA shows that Hive Thrift Server shows OOM error, but in my case I didnt see any OOM error in my case.

        Regards,
        Manish

        Full Exception Stack: ( The exception comes when the server is loaded and new requests are timing out )

        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
        at $Proxy7.getDatabase(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
        at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
        at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
        Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 34 more
        2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401)) - FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
        at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
        at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)

        Show
        manish.hadoop.work@gmail.com Manish Malhotra added a comment - Continue on the similar thread. Following are the work, I did for the load testing and fixing some of the issues. It will be great, if somebody can review this and see, if there are things which Im missing. And some time still I see the SocketTimeoutException, but ETL jobs are not failing. Currently running load test with following commands / load using Hive Client APIs. a. Create Partition - 10 threads b. ListPartition - 30 threads c. Show tables – 100 threads Load on the server was around 1200 Request Per Minute. and for this test Thirft Server + MySQL looks good. The tuning and finding are: Thrift Server 1. JVM tuning : (JVM profiling shows with default settings, there were too frequent Full GC happening) Young Generation GC Algo: Parallel Old Generation GC Algo: CMS Max_Heap: 11 Gb SurvivorRatio : 6 Graph before optimization: – Attaching as separate files. Graph after optimization: – Attaching as separate files. 2. Database Connection Pooling: Thrift Server uses DataNucleus framework for DB operations. And it uses DBCP as the connection pooling tool, the default config for DBCP is maxConnections = 10. Changed it to 30. As that is the basic bottleneck to server more requests. Database: 1. innodb_buffer = 8gb and tmp_table_space, max_heap_space = 256 mb. The other problem I unearthed was that one of the hive-table that had more 1 million rows, and in PROD the ListPartition was happening on this table, when this happened it takes a lot of time to get the response from DB as there are too many rows in the PARTITION table. So, it started blocking threads in Thrift Server and keep using one of the DB Connection and eventually got into state where all the DB Connections are used and new request cannot get the DB Connection and started getting. This problem was eventually making our hive queries failing and restarting. When solved this problem the throughput of the Thrift Server has increased a lot. And The failure of Hive Jobs has reduced a lot. So, please let me know if these changes and solving the ListPartion problem for big table is good or there are few other things which we should take care. Regards, Manish --------------------------------------------------------- Following are the details of the PROD Infrastructure --------------------------------------------------------- Load = 500 req/min. Exception: "org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out" As the metastore we are using MySQL, that is being used by Thrift server. The flow is like this: Oozie – > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> MySQL (Master) – > MySQL (Slave). Software versions: Hive version : 0.10.0 Hadoop: 1.2.1 I found one related JIRA : https://issues.apache.org/jira/browse/HCATALOG-541 But this JIRA shows that Hive Thrift Server shows OOM error, but in my case I didnt see any OOM error in my case. Regards, Manish Full Exception Stack: ( The exception comes when the server is loaded and new requests are timing out ) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74) at $Proxy7.getDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 34 more 2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401)) - FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        Hide
        manish.hadoop.work@gmail.com Manish Malhotra added a comment -

        After HMS JVM Tuning

        Show
        manish.hadoop.work@gmail.com Manish Malhotra added a comment - After HMS JVM Tuning
        Hide
        manish.hadoop.work@gmail.com Manish Malhotra added a comment -

        Before JVM Profiling Tuning

        Show
        manish.hadoop.work@gmail.com Manish Malhotra added a comment - Before JVM Profiling Tuning
        Hide
        vgumashta Vaibhav Gumashta added a comment -

        Manish Malhotra Your hive version is really old. Newer releases have fixed multiple bugs, have many new features and much lower latency. My suggestion would be to upgrade.

        Show
        vgumashta Vaibhav Gumashta added a comment - Manish Malhotra Your hive version is really old. Newer releases have fixed multiple bugs, have many new features and much lower latency. My suggestion would be to upgrade.
        Hide
        manish.hadoop.work@gmail.com Manish Malhotra added a comment -

        Thanks Vaibhav Gumashta.
        But does the old version had this issue, as I didnt see issue like this, apart from https://issues.apache.org/jira/browse/HCATALOG-541 where there were OOM.

        We are in the process of upgrading the Hive to 12.

        Meanwhile the steps I have taken for better performance and to avoid this problem are:

        1. Database connection pooling tuning,
        Default is 10, made it 30 on each thrift server.
        Though the the DBCP Connection Pool ( maximum connections) config also need to be think though as that will also have implication of using MySQL resources.

        2. JVM GC Tuning

        3. keeping number of partitions in tact

        Do you have any other suggestion for production deployment.

        Plus I have another question, that is, Thrift Server uses DataNucleus framework which is Open-Source Persistence product, and internally uses JDO.
        DataNucleus doesnt support all the configs for DBCP connection pooling,
        So, should either Thrift can use another ORM tool or provide more hooks for the DBCP support.

        Regards,
        Manish

        Show
        manish.hadoop.work@gmail.com Manish Malhotra added a comment - Thanks Vaibhav Gumashta . But does the old version had this issue, as I didnt see issue like this, apart from https://issues.apache.org/jira/browse/HCATALOG-541 where there were OOM. We are in the process of upgrading the Hive to 12. Meanwhile the steps I have taken for better performance and to avoid this problem are: 1. Database connection pooling tuning, Default is 10, made it 30 on each thrift server. Though the the DBCP Connection Pool ( maximum connections) config also need to be think though as that will also have implication of using MySQL resources. 2. JVM GC Tuning 3. keeping number of partitions in tact Do you have any other suggestion for production deployment. Plus I have another question, that is, Thrift Server uses DataNucleus framework which is Open-Source Persistence product, and internally uses JDO. DataNucleus doesnt support all the configs for DBCP connection pooling, So, should either Thrift can use another ORM tool or provide more hooks for the DBCP support. Regards, Manish
        Hide
        manish.hadoop.work@gmail.com Manish Malhotra added a comment -

        Vaibhav Gumashta
        Please see, based on ghe testing and results i got can i propose to add the connection pooling prop in the config.

        So, does it make sense to update the data nucleus config to add thw max connection propert kn the hive-site.xml.

        As what is happening, the default 10 db connection is not good for most of the prod systems.

        Thanks.

        Regards,
        Manish

        Show
        manish.hadoop.work@gmail.com Manish Malhotra added a comment - Vaibhav Gumashta Please see, based on ghe testing and results i got can i propose to add the connection pooling prop in the config. So, does it make sense to update the data nucleus config to add thw max connection propert kn the hive-site.xml. As what is happening, the default 10 db connection is not good for most of the prod systems. Thanks. Regards, Manish
        Hide
        vgumashta Vaibhav Gumashta added a comment -

        Manish Malhotra Sorry about the late response. Yes, makes sense to set the datanuclues config to allow more db connections. I would set the size based on how big is the thrift worker pool. I don't think 10 db connections are good enough for 500 worker threads.

        Show
        vgumashta Vaibhav Gumashta added a comment - Manish Malhotra Sorry about the late response. Yes, makes sense to set the datanuclues config to allow more db connections. I would set the size based on how big is the thrift worker pool. I don't think 10 db connections are good enough for 500 worker threads.

          People

          • Assignee:
            Unassigned
            Reporter:
            manish.hadoop.work@gmail.com Manish Malhotra
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development