[HIVE-5546] A change in ORCInputFormat made by HIVE-4113 was reverted by HIVE-5391 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.13.0
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included column ids = 
2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included columns names = 
2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: No ORC pushdown predicate
2013-10-15 10:49:49,834 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://localhost:54310/user/hive/warehouse/web_sales_orc/000000_0
2013-10-15 10:49:49,834 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2013-10-15 10:49:49,840 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2013-10-15 10:49:49,968 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName yhuai for UID 1000 from the native implementation
2013-10-15 10:49:49,996 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

If includedColumnIds is an empty list, we do not need to read any column. But, right now, in OrcInputFormat.findIncludedColumns, we have ...

if (ColumnProjectionUtils.isReadAllColumns(conf) ||
      includedStr == null || includedStr.trim().length() == 0) {
      return null;
    }

If includedStr is an empty string, the code assumes that we need all columns, which is not correct.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-5546.1.patch
15/Oct/13 15:31
0.9 kB
Yin Huai
HIVE-5546.2.patch
15/Oct/13 20:48
1 kB
Yin Huai

Issue Links

is broken by

HIVE-5391 make ORC predicate pushdown work with vectorization

Resolved

is duplicated by

HIVE-5563 Skip reading columns in ORC for count(*)

Resolved

Activity

People

Assignee:: Yin Huai

Reporter:: Yin Huai

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Oct/13 15:06

Updated:: 17/Oct/13 06:15

Resolved:: 16/Oct/13 15:33