Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4324

Hive native reader is slow when the underlying parquet file has more row groups

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      git.commit.id.abbrev=3d0b4b0

      TPCDS Query 84:

      SELECT c_customer_id   AS customer_id, 
                     c_last_name 
                     || ', ' 
                     || c_first_name AS customername 
      FROM   customer, 
             customer_address, 
             customer_demographics, 
             household_demographics, 
             income_band, 
             store_returns 
      WHERE  ca_city = 'Green Acres' 
             AND c_current_addr_sk = ca_address_sk 
             AND ib_lower_bound >= 54986 
             AND ib_upper_bound <= 54986 + 50000 
             AND ib_income_band_sk = hd_income_band_sk 
             AND cd_demo_sk = c_current_cdemo_sk 
             AND hd_demo_sk = c_current_hdemo_sk 
             AND sr_cdemo_sk = cd_demo_sk 
      ORDER  BY c_customer_id
      LIMIT 100;
      

      Execution times :

      Hive Plugin : 12.34 seconds
      Hive Native Reader : 360.866
      DFS Parquet Reader : 84.3 seconds
      

      Note : These data sets were generated by hive and the underlying parquet files have more than 1 row groups (household_demographics has ~8000 row groups)

      The data files are larger than 10 MB to attach them here. Reach out to me if you need anything else

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rkins Rahul Kumar Challapalli

              Dates

              • Created:
                Updated:

                Issue deployment