Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4113

Optimize select count(1) with RCFile and Orc

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: File Formats
    • Labels:
      None

      Description

      select count(1) loads up every column & every row when used with RCFile.

      "select count(1) from store_sales_10_rc" gives

      Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 HDFS Write: 8 SUCCESS
      

      Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far less

      Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 HDFS Write: 8 SUCCESS
      

      Which is 11% of the data size read by the COUNT(1).

      This was tracked down to the following code in RCFile.java

            } else {
              // TODO: if no column name is specified e.g, in select count(1) from tt;
              // skip all columns, this should be distinguished from the case:
              // select * from tt;
              for (int i = 0; i < skippedColIDs.length; i++) {
                skippedColIDs[i] = false;
              }
      

        Attachments

        1. HIVE-4113.1.patch
          55 kB
          Yin Huai
        2. HIVE-4113.10.patch
          449 kB
          Yin Huai
        3. HIVE-4113.11.patch
          458 kB
          Yin Huai
        4. HIVE-4113.2.patch
          54 kB
          Yin Huai
        5. HIVE-4113.3.patch
          60 kB
          Yin Huai
        6. HIVE-4113.4.patch
          64 kB
          Yin Huai
        7. HIVE-4113.5.patch
          60 kB
          Yin Huai
        8. HIVE-4113.6.patch
          61 kB
          Yin Huai
        9. HIVE-4113.7.patch
          77 kB
          Yin Huai
        10. HIVE-4113.8.patch
          130 kB
          Yin Huai
        11. HIVE-4113.9.patch
          449 kB
          Yin Huai
        12. HIVE-4113.patch
          55 kB
          Brock Noland
        13. HIVE-4113.patch
          51 kB
          Brock Noland
        14. HIVE-4113-0.patch
          55 kB
          Brock Noland

          Issue Links

            Activity

              People

              • Assignee:
                yhuai Yin Huai
                Reporter:
                gopalv Gopal V
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: