Hive
  1. Hive
  2. HIVE-4113

Optimize select count(1) with RCFile and Orc

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: File Formats
    • Labels:
      None

      Description

      select count(1) loads up every column & every row when used with RCFile.

      "select count(1) from store_sales_10_rc" gives

      Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 HDFS Write: 8 SUCCESS
      

      Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far less

      Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 HDFS Write: 8 SUCCESS
      

      Which is 11% of the data size read by the COUNT(1).

      This was tracked down to the following code in RCFile.java

            } else {
              // TODO: if no column name is specified e.g, in select count(1) from tt;
              // skip all columns, this should be distinguished from the case:
              // select * from tt;
              for (int i = 0; i < skippedColIDs.length; i++) {
                skippedColIDs[i] = false;
              }
      
      1. HIVE-4113.1.patch
        55 kB
        Yin Huai
      2. HIVE-4113.10.patch
        449 kB
        Yin Huai
      3. HIVE-4113.11.patch
        458 kB
        Yin Huai
      4. HIVE-4113.2.patch
        54 kB
        Yin Huai
      5. HIVE-4113.3.patch
        60 kB
        Yin Huai
      6. HIVE-4113.4.patch
        64 kB
        Yin Huai
      7. HIVE-4113.5.patch
        60 kB
        Yin Huai
      8. HIVE-4113.6.patch
        61 kB
        Yin Huai
      9. HIVE-4113.7.patch
        77 kB
        Yin Huai
      10. HIVE-4113.8.patch
        130 kB
        Yin Huai
      11. HIVE-4113.9.patch
        449 kB
        Yin Huai
      12. HIVE-4113.patch
        55 kB
        Brock Noland
      13. HIVE-4113.patch
        51 kB
        Brock Noland
      14. HIVE-4113-0.patch
        55 kB
        Brock Noland

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Yin Huai
              Reporter:
              Gopal V
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development