Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4324

Hive native reader is slow when the underlying parquet file has more row groups

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      git.commit.id.abbrev=3d0b4b0

      TPCDS Query 84:

      SELECT c_customer_id   AS customer_id, 
                     c_last_name 
                     || ', ' 
                     || c_first_name AS customername 
      FROM   customer, 
             customer_address, 
             customer_demographics, 
             household_demographics, 
             income_band, 
             store_returns 
      WHERE  ca_city = 'Green Acres' 
             AND c_current_addr_sk = ca_address_sk 
             AND ib_lower_bound >= 54986 
             AND ib_upper_bound <= 54986 + 50000 
             AND ib_income_band_sk = hd_income_band_sk 
             AND cd_demo_sk = c_current_cdemo_sk 
             AND hd_demo_sk = c_current_hdemo_sk 
             AND sr_cdemo_sk = cd_demo_sk 
      ORDER  BY c_customer_id
      LIMIT 100;
      

      Execution times :

      Hive Plugin : 12.34 seconds
      Hive Native Reader : 360.866
      DFS Parquet Reader : 84.3 seconds
      

      Note : These data sets were generated by hive and the underlying parquet files have more than 1 row groups (household_demographics has ~8000 row groups)

      The data files are larger than 10 MB to attach them here. Reach out to me if you need anything else

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rkins Rahul Kumar Challapalli
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: