Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5546

A change in ORCInputFormat made by HIVE-4113 was reverted by HIVE-5391

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.13.0
    • None
    • None

    Description

      2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included column ids = 
      2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included columns names = 
      2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: No ORC pushdown predicate
      2013-10-15 10:49:49,834 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://localhost:54310/user/hive/warehouse/web_sales_orc/000000_0
      2013-10-15 10:49:49,834 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
      2013-10-15 10:49:49,840 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
      2013-10-15 10:49:49,968 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
      2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
      2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName yhuai for UID 1000 from the native implementation
      2013-10-15 10:49:49,996 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
      	at org.apache.hadoop.mapred.Child.main(Child.java:249)
      

      If includedColumnIds is an empty list, we do not need to read any column. But, right now, in OrcInputFormat.findIncludedColumns, we have ...

      if (ColumnProjectionUtils.isReadAllColumns(conf) ||
            includedStr == null || includedStr.trim().length() == 0) {
            return null;
          } 
      

      If includedStr is an empty string, the code assumes that we need all columns, which is not correct.

      Attachments

        1. HIVE-5546.1.patch
          0.9 kB
          Yin Huai
        2. HIVE-5546.2.patch
          1 kB
          Yin Huai

        Issue Links

          Activity

            People

              yhuai Yin Huai
              yhuai Yin Huai
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: