Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.13.0
-
None
-
None
Description
2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included column ids = 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included columns names = 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: No ORC pushdown predicate 2013-10-15 10:49:49,834 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://localhost:54310/user/hive/warehouse/web_sales_orc/000000_0 2013-10-15 10:49:49,834 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 2013-10-15 10:49:49,840 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2013-10-15 10:49:49,968 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. 2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName yhuai for UID 1000 from the native implementation 2013-10-15 10:49:49,996 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.Child.main(Child.java:249)
If includedColumnIds is an empty list, we do not need to read any column. But, right now, in OrcInputFormat.findIncludedColumns, we have ...
if (ColumnProjectionUtils.isReadAllColumns(conf) || includedStr == null || includedStr.trim().length() == 0) { return null; }
If includedStr is an empty string, the code assumes that we need all columns, which is not correct.