Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28026

Reading proto data more than 2GB from multiple splits fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0-beta-1
    • Not Applicable
    • None
    •   

    Description

      Query: select * from <table_name>

      Explanation:

      On running the above mentioned query on a hive proto table, multiple tez containers will be spawned to process the data. In a container, if there are multiple hdfs splits and the combined size of decompressed data is more than 2GB then the query fails with the following error:

      "While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length." 

       

      This is happening because of CodedInputStream i.e. byteLimit += totalBytesRetired + pos;

      byteLimit is __ getting InterOverflow as totalBytesRetired is retaining count of all the bytes that it has read as CodedInputStream is initiliazed once for a container https://github.com/apache/hive/blob/564d7e54d2360488611da39d0e5f027a2d574fc1/ql/src/java/org/apache/tez/dag/history/logging/proto/ProtoMessageWritable.java#L96

       

      This is different from issue reproduced in https://github.com/zabetak/protobuf-large-message as there it is a single proto data file more than 2GB, but in my case, there are multiple file total resulting in 2GB.

      CC zabetak 

      Limitation:

      This fix will still not resolve the issue which is mentioned https://github.com/protocolbuffers/protobuf/issues/11729 

      Here is DDL:

       

      beeline  -u 'jdbc:hive2://hostnames/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;thrift.client.max.message.size=2147483647' --showHeader=false --outputformat=tsv2 -e "select * from raaggarw.proto_hive_query_data where executionmode='MR' and otherinfo['CONF'] != 'NULL'" >> ./output 

       

      Attachments

        Issue Links

          Activity

            People

              Aggarwal_Raghav Raghav Aggarwal
              Aggarwal_Raghav Raghav Aggarwal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: