Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24849

Create external table socket timeout when location has large number of files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.4, 3.1.2, 4.0.0
    • 4.0.0-alpha-1
    • Metastore
    • AWS EMR 5.23 with default Hive metastore and external location S3

       

    Description

      1. The create table API call timeout when during an external table creation on a location where the number files in the S3 location is large ( ie: ~10K objects ).

      The default timeout `hive.metastore.client.socket.timeout` is `600s` current workaround is it to increase the timeout to a higher value

      2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c main([])]: exec.DDLTask (DDLTask.java:failed(639)) - org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
       at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
       at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
       at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
       at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
       at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
       at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
       at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
       at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
       at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
       at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
       at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
      Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
       at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
       at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
       at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
       at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
       at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
       at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185)
       at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2399)
       at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93)
       at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752)
       at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
       at com.sun.proxy.$Proxy37.createTable(Unknown Source)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2330)
       at com.sun.proxy.$Proxy37.createTable(Unknown Source)
       at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:863)
       ... 25 more
      Caused by: java.net.SocketTimeoutException: Read timed out
       at java.net.SocketInputStream.socketRead0(Native Method)
       at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
       at java.net.SocketInputStream.read(SocketInputStream.java:171)
       at java.net.SocketInputStream.read(SocketInputStream.java:141)
       at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
       at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
       at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
       ... 49 more

       

      Understanding is the create table should not check the exisiting files/partitions and the validation is done when the table is read.

      The issue could be reproduced executing the below statment in Hive CLI where the location should contain ~10k objects 

       

      SET hive.stats.autogather=false;
      DROP table if exists tTable;
      CREATE EXTERNAL TABLE tTable(
         id string,
         mId string,
         wd bigint,
         lang string,
         tcountry string,
         vId bigint
      ) 
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
      LOCATION "s3://<your bucket>/"
      TBLPROPERTIES ('serialization.null.format' = ''); 

       

      Attachments

        Issue Links

          Activity

            People

              glapark Sungwoo Park
              maxmithun Mithun Antony
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h