Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26633

Make thrift max message size configurable

    XMLWordPrintableJSON

Details

    Description

      Since thrift >= 0.14, thrift now enforces max message sizes through a TConfiguration object as described here:
      https://github.com/apache/thrift/blob/master/doc/specs/thrift-tconfiguration.md

      By default MaxMessageSize gets set to 100MB.

      As a result it is possible for HMS clients not to be able to retrieve certain metadata for tables with a large amount of partitions or other metadata.

      For example on a cluster configured with kerberos between hs2 and hms, querying a large table (10k partitions, 200 columns with names of 200 characters) results in this backtrace:

      org.apache.thrift.transport.TTransportException: MaxMessageSize reached
      at org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96) 
      at org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97) 
      at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:390) 
      at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39) 
      at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) 
      at org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:63) 
      at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) 
      at org.apache.thrift.protocol.TBinaryProtocol.readByte(TBinaryProtocol.java:329) 
      at org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:273) 
      at org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:461) 
      at org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:454) 
      at org.apache.hadoop.hive.metastore.api.FieldSchema.read(FieldSchema.java:388) 
      at org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1269) 
      at org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1248) 
      at org.apache.hadoop.hive.metastore.api.StorageDescriptor.read(StorageDescriptor.java:1110) 
      at org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1270) 
      at org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1205) 
      at org.apache.hadoop.hive.metastore.api.Partition.read(Partition.java:1062) 
      at org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:420) 
      at org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:399) 
      at org.apache.hadoop.hive.metastore.api.PartitionsByExprResult.read(PartitionsByExprResult.java:335) 
      at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java)  

      Making this configurable (and defaulting to a higher value) would allow these tables to still be accessible.

      Attachments

        Issue Links

          Activity

            People

              jfs John Sherman
              jfs John Sherman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 50m
                  4h 50m