Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
select count(1) loads up every column & every row when used with RCFile.
"select count(1) from store_sales_10_rc" gives
Job 0: Map: 5 Reduce: 1 Cumulative CPU: 31.73 sec HDFS Read: 234914410 HDFS Write: 8 SUCCESS
Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far less
Job 0: Map: 5 Reduce: 1 Cumulative CPU: 29.75 sec HDFS Read: 28145994 HDFS Write: 8 SUCCESS
Which is 11% of the data size read by the COUNT(1).
This was tracked down to the following code in RCFile.java
} else { // TODO: if no column name is specified e.g, in select count(1) from tt; // skip all columns, this should be distinguished from the case: // select * from tt; for (int i = 0; i < skippedColIDs.length; i++) { skippedColIDs[i] = false; }