Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7136

Num_buckets for HashAgg in profile may be inaccurate

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.16.0
    • Fix Version/s: 1.17.0
    • Component/s: Tools, Build & Test
    • Labels:
      None

      Description

      I ran TPCH query 17 with sf 1000. Here is the query:

      select
        sum(l.l_extendedprice) / 7.0 as avg_yearly
      from
        lineitem l,
        part p
      where
        p.p_partkey = l.l_partkey
        and p.p_brand = 'Brand#13'
        and p.p_container = 'JUMBO CAN'
        and l.l_quantity < (
          select
            0.2 * avg(l2.l_quantity)
          from
            lineitem l2
          where
            l2.l_partkey = p.p_partkey
        );
      

      One of the hash agg operators has resized 6 times. It should have 4M buckets. But the profile shows it has 64K buckets.

      I have attached a sample profile. In this profile, the hash agg operator is (04-02).

      Operator Metrics
      Minor Fragment	NUM_BUCKETS	NUM_ENTRIES	NUM_RESIZING	RESIZING_TIME_MS	NUM_PARTITIONS	SPILLED_PARTITIONS	SPILL_MB	SPILL_CYCLE	INPUT_BATCH_COUNT	AVG_INPUT_BATCH_BYTES	AVG_INPUT_ROW_BYTES	INPUT_RECORD_COUNT	OUTPUT_BATCH_COUNT	AVG_OUTPUT_BATCH_BYTES	AVG_OUTPUT_ROW_BYTES	OUTPUT_RECORD_COUNT
      04-00-02	65,536	           748,746	6	364	1		582	0	813	582,653	18	26,316,456	401	1,631,943	25	26,176,350
      

        Attachments

          Activity

            People

            • Assignee:
              ben-zvi Boaz Ben-Zvi
              Reporter:
              rhou Robert Hou
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: