Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34120 Improve the statistics estimation
  3. SPARK-34119

Keep necessary stats after partition pruning

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      It missing stats if filter conditions contains dynamicpruning, we should keep these stats after partition pruning:

      == Optimized Logical Plan ==
      Project [i_item_sk#7 AS ss_item_sk#162], Statistics(sizeInBytes=8.07E+27 B)
      +- Join Inner, (((i_brand_id#14 = brand_id#159) AND (i_class_id#16 = class_id#160)) AND (i_category_id#18 = category_id#161)), Statistics(sizeInBytes=2.42E+28 B)
         :- Project [i_item_sk#7, i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=8.5 MiB, rowCount=3.69E+5)
         :  +- Filter ((isnotnull(i_brand_id#14) AND isnotnull(i_class_id#16)) AND isnotnull(i_category_id#18)), Statistics(sizeInBytes=150.0 MiB, rowCount=3.69E+5)
         :     +- Relation[i_item_sk#7,i_item_id#8,i_rec_start_date#9,i_rec_end_date#10,i_item_desc#11,i_current_price#12,i_wholesale_cost#13,i_brand_id#14,i_brand#15,i_class_id#16,i_class#17,i_category_id#18,i_category#19,i_manufact_id#20,i_manufact#21,i_size#22,i_formulation#23,i_color#24,i_units#25,i_container#26,i_manager_id#27,i_product_name#28] parquet, Statistics(sizeInBytes=151.1 MiB, rowCount=3.72E+5)
         +- Aggregate [brand_id#159, class_id#160, category_id#161], [brand_id#159, class_id#160, category_id#161], Statistics(sizeInBytes=2.73E+21 B)
            +- Aggregate [brand_id#159, class_id#160, category_id#161], [brand_id#159, class_id#160, category_id#161], Statistics(sizeInBytes=2.73E+21 B)
               +- Join LeftSemi, (((brand_id#159 <=> i_brand_id#14) AND (class_id#160 <=> i_class_id#16)) AND (category_id#161 <=> i_category_id#18)), Statistics(sizeInBytes=2.73E+21 B)
                  :- Join LeftSemi, (((brand_id#159 <=> i_brand_id#14) AND (class_id#160 <=> i_class_id#16)) AND (category_id#161 <=> i_category_id#18)), Statistics(sizeInBytes=2.73E+21 B)
                  :  :- Project [i_brand_id#14 AS brand_id#159, i_class_id#16 AS class_id#160, i_category_id#18 AS category_id#161], Statistics(sizeInBytes=2.73E+21 B)
                  :  :  +- Join Inner, (ss_sold_date_sk#51 = d_date_sk#52), Statistics(sizeInBytes=3.83E+21 B)
                  :  :     :- Project [ss_sold_date_sk#51, i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=387.3 PiB)
                  :  :     :  +- Join Inner, (ss_item_sk#30 = i_item_sk#7), Statistics(sizeInBytes=516.5 PiB)
                  :  :     :     :- Project [ss_item_sk#30, ss_sold_date_sk#51], Statistics(sizeInBytes=61.1 GiB)
                  :  :     :     :  +- Filter ((isnotnull(ss_item_sk#30) AND isnotnull(ss_sold_date_sk#51)) AND dynamicpruning#168 [ss_sold_date_sk#51]), Statistics(sizeInBytes=580.6 GiB)
                  :  :     :     :     :  +- Project [d_date_sk#52], Statistics(sizeInBytes=8.6 KiB, rowCount=731)
                  :  :     :     :     :     +- Filter ((((d_year#58 >= 1999) AND (d_year#58 <= 2001)) AND isnotnull(d_year#58)) AND isnotnull(d_date_sk#52)), Statistics(sizeInBytes=175.6 KiB, rowCount=731)
                  :  :     :     :     :        +- Relation[d_date_sk#52,d_date_id#53,d_date#54,d_month_seq#55,d_week_seq#56,d_quarter_seq#57,d_year#58,d_dow#59,d_moy#60,d_dom#61,d_qoy#62,d_fy_year#63,d_fy_quarter_seq#64,d_fy_week_seq#65,d_day_name#66,d_quarter_name#67,d_holiday#68,d_weekend#69,d_following_holiday#70,d_first_dom#71,d_last_dom#72,d_same_day_ly#73,d_same_day_lq#74,d_current_day#75,... 4 more fields] parquet, Statistics(sizeInBytes=17.1 MiB, rowCount=7.30E+4)
                  :  :     :     :     +- Relation[ss_sold_time_sk#29,ss_item_sk#30,ss_customer_sk#31,ss_cdemo_sk#32,ss_hdemo_sk#33,ss_addr_sk#34,ss_store_sk#35,ss_promo_sk#36,ss_ticket_number#37L,ss_quantity#38,ss_wholesale_cost#39,ss_list_price#40,ss_sales_price#41,ss_ext_discount_amt#42,ss_ext_sales_price#43,ss_ext_wholesale_cost#44,ss_ext_list_price#45,ss_ext_tax#46,ss_coupon_amt#47,ss_net_paid#48,ss_net_paid_inc_tax#49,ss_net_profit#50,ss_sold_date_sk#51] parquet, Statistics(sizeInBytes=580.6 GiB)
                  :  :     :     +- Project [i_item_sk#7, i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=8.5 MiB, rowCount=3.69E+5)
                  :  :     :        +- Filter (((isnotnull(i_brand_id#14) AND isnotnull(i_class_id#16)) AND isnotnull(i_category_id#18)) AND isnotnull(i_item_sk#7)), Statistics(sizeInBytes=150.0 MiB, rowCount=3.69E+5)
                  :  :     :           +- Relation[i_item_sk#7,i_item_id#8,i_rec_start_date#9,i_rec_end_date#10,i_item_desc#11,i_current_price#12,i_wholesale_cost#13,i_brand_id#14,i_brand#15,i_class_id#16,i_class#17,i_category_id#18,i_category#19,i_manufact_id#20,i_manufact#21,i_size#22,i_formulation#23,i_color#24,i_units#25,i_container#26,i_manager_id#27,i_product_name#28] parquet, Statistics(sizeInBytes=151.1 MiB, rowCount=3.72E+5)
                  :  :     +- Project [d_date_sk#52], Statistics(sizeInBytes=8.6 KiB, rowCount=731)
                  :  :        +- Filter ((((d_year#58 >= 1999) AND (d_year#58 <= 2001)) AND isnotnull(d_year#58)) AND isnotnull(d_date_sk#52)), Statistics(sizeInBytes=175.6 KiB, rowCount=731)
                  :  :           +- Relation[d_date_sk#52,d_date_id#53,d_date#54,d_month_seq#55,d_week_seq#56,d_quarter_seq#57,d_year#58,d_dow#59,d_moy#60,d_dom#61,d_qoy#62,d_fy_year#63,d_fy_quarter_seq#64,d_fy_week_seq#65,d_day_name#66,d_quarter_name#67,d_holiday#68,d_weekend#69,d_following_holiday#70,d_first_dom#71,d_last_dom#72,d_same_day_ly#73,d_same_day_lq#74,d_current_day#75,... 4 more fields] parquet, Statistics(sizeInBytes=17.1 MiB, rowCount=7.30E+4)
                  :  +- Aggregate [i_brand_id#14, i_class_id#16, i_category_id#18], [i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=1414.2 EiB)
                  :     +- Project [i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=1414.2 EiB)
                  :        +- Join Inner, (cs_sold_date_sk#113 = d_date_sk#52), Statistics(sizeInBytes=1979.9 EiB)
                  :           :- Project [cs_sold_date_sk#113, i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=231.1 PiB)
                  :           :  +- Join Inner, (cs_item_sk#94 = i_item_sk#7), Statistics(sizeInBytes=308.2 PiB)
                  :           :     :- Project [cs_item_sk#94, cs_sold_date_sk#113], Statistics(sizeInBytes=36.2 GiB)
                  :           :     :  +- Filter ((isnotnull(cs_item_sk#94) AND isnotnull(cs_sold_date_sk#113)) AND dynamicpruning#169 [cs_sold_date_sk#113]), Statistics(sizeInBytes=470.5 GiB)
                  :           :     :     :  +- Project [d_date_sk#52], Statistics(sizeInBytes=8.6 KiB, rowCount=731)
                  :           :     :     :     +- Filter ((((d_year#58 >= 1999) AND (d_year#58 <= 2001)) AND isnotnull(d_year#58)) AND isnotnull(d_date_sk#52)), Statistics(sizeInBytes=175.6 KiB, rowCount=731)
                  :           :     :     :        +- Relation[d_date_sk#52,d_date_id#53,d_date#54,d_month_seq#55,d_week_seq#56,d_quarter_seq#57,d_year#58,d_dow#59,d_moy#60,d_dom#61,d_qoy#62,d_fy_year#63,d_fy_quarter_seq#64,d_fy_week_seq#65,d_day_name#66,d_quarter_name#67,d_holiday#68,d_weekend#69,d_following_holiday#70,d_first_dom#71,d_last_dom#72,d_same_day_ly#73,d_same_day_lq#74,d_current_day#75,... 4 more fields] parquet, Statistics(sizeInBytes=17.1 MiB, rowCount=7.30E+4)
                  :           :     :     +- Relation[cs_sold_time_sk#80,cs_ship_date_sk#81,cs_bill_customer_sk#82,cs_bill_cdemo_sk#83,cs_bill_hdemo_sk#84,cs_bill_addr_sk#85,cs_ship_customer_sk#86,cs_ship_cdemo_sk#87,cs_ship_hdemo_sk#88,cs_ship_addr_sk#89,cs_call_center_sk#90,cs_catalog_page_sk#91,cs_ship_mode_sk#92,cs_warehouse_sk#93,cs_item_sk#94,cs_promo_sk#95,cs_order_number#96L,cs_quantity#97,cs_wholesale_cost#98,cs_list_price#99,cs_sales_price#100,cs_ext_discount_amt#101,cs_ext_sales_price#102,cs_ext_wholesale_cost#103,... 10 more fields] parquet, Statistics(sizeInBytes=470.5 GiB)
                  :           :     +- Project [i_item_sk#7, i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=8.5 MiB, rowCount=3.72E+5)
                  :           :        +- Filter isnotnull(i_item_sk#7), Statistics(sizeInBytes=151.1 MiB, rowCount=3.72E+5)
                  :           :           +- Relation[i_item_sk#7,i_item_id#8,i_rec_start_date#9,i_rec_end_date#10,i_item_desc#11,i_current_price#12,i_wholesale_cost#13,i_brand_id#14,i_brand#15,i_class_id#16,i_class#17,i_category_id#18,i_category#19,i_manufact_id#20,i_manufact#21,i_size#22,i_formulation#23,i_color#24,i_units#25,i_container#26,i_manager_id#27,i_product_name#28] parquet, Statistics(sizeInBytes=151.1 MiB, rowCount=3.72E+5)
                  :           +- Project [d_date_sk#52], Statistics(sizeInBytes=8.6 KiB, rowCount=731)
                  :              +- Filter ((((d_year#58 >= 1999) AND (d_year#58 <= 2001)) AND isnotnull(d_year#58)) AND isnotnull(d_date_sk#52)), Statistics(sizeInBytes=175.6 KiB, rowCount=731)
                  :                 +- Relation[d_date_sk#52,d_date_id#53,d_date#54,d_month_seq#55,d_week_seq#56,d_quarter_seq#57,d_year#58,d_dow#59,d_moy#60,d_dom#61,d_qoy#62,d_fy_year#63,d_fy_quarter_seq#64,d_fy_week_seq#65,d_day_name#66,d_quarter_name#67,d_holiday#68,d_weekend#69,d_following_holiday#70,d_first_dom#71,d_last_dom#72,d_same_day_ly#73,d_same_day_lq#74,d_current_day#75,... 4 more fields] parquet, Statistics(sizeInBytes=17.1 MiB, rowCount=7.30E+4)
                  +- Aggregate [i_brand_id#14, i_class_id#16, i_category_id#18], [i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=650.5 EiB)
                     +- Project [i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=650.5 EiB)
                        +- Join Inner, (ws_sold_date_sk#147 = d_date_sk#52), Statistics(sizeInBytes=910.6 EiB)
                           :- Project [ws_sold_date_sk#147, i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=106.3 PiB)
                           :  +- Join Inner, (ws_item_sk#116 = i_item_sk#7), Statistics(sizeInBytes=141.7 PiB)
                           :     :- Project [ws_item_sk#116, ws_sold_date_sk#147], Statistics(sizeInBytes=16.6 GiB)
                           :     :  +- Filter ((isnotnull(ws_item_sk#116) AND isnotnull(ws_sold_date_sk#147)) AND dynamicpruning#170 [ws_sold_date_sk#147]), Statistics(sizeInBytes=216.4 GiB)
                           :     :     :  +- Project [d_date_sk#52], Statistics(sizeInBytes=8.6 KiB, rowCount=731)
                           :     :     :     +- Filter ((((d_year#58 >= 1999) AND (d_year#58 <= 2001)) AND isnotnull(d_year#58)) AND isnotnull(d_date_sk#52)), Statistics(sizeInBytes=175.6 KiB, rowCount=731)
                           :     :     :        +- Relation[d_date_sk#52,d_date_id#53,d_date#54,d_month_seq#55,d_week_seq#56,d_quarter_seq#57,d_year#58,d_dow#59,d_moy#60,d_dom#61,d_qoy#62,d_fy_year#63,d_fy_quarter_seq#64,d_fy_week_seq#65,d_day_name#66,d_quarter_name#67,d_holiday#68,d_weekend#69,d_following_holiday#70,d_first_dom#71,d_last_dom#72,d_same_day_ly#73,d_same_day_lq#74,d_current_day#75,... 4 more fields] parquet, Statistics(sizeInBytes=17.1 MiB, rowCount=7.30E+4)
                           :     :     +- Relation[ws_sold_time_sk#114,ws_ship_date_sk#115,ws_item_sk#116,ws_bill_customer_sk#117,ws_bill_cdemo_sk#118,ws_bill_hdemo_sk#119,ws_bill_addr_sk#120,ws_ship_customer_sk#121,ws_ship_cdemo_sk#122,ws_ship_hdemo_sk#123,ws_ship_addr_sk#124,ws_web_page_sk#125,ws_web_site_sk#126,ws_ship_mode_sk#127,ws_warehouse_sk#128,ws_promo_sk#129,ws_order_number#130L,ws_quantity#131,ws_wholesale_cost#132,ws_list_price#133,ws_sales_price#134,ws_ext_discount_amt#135,ws_ext_sales_price#136,ws_ext_wholesale_cost#137,... 10 more fields] parquet, Statistics(sizeInBytes=216.4 GiB)
                           :     +- Project [i_item_sk#7, i_brand_id#14, i_class_id#16, i_category_id#18], Statistics(sizeInBytes=8.5 MiB, rowCount=3.72E+5)
                           :        +- Filter isnotnull(i_item_sk#7), Statistics(sizeInBytes=151.1 MiB, rowCount=3.72E+5)
                           :           +- Relation[i_item_sk#7,i_item_id#8,i_rec_start_date#9,i_rec_end_date#10,i_item_desc#11,i_current_price#12,i_wholesale_cost#13,i_brand_id#14,i_brand#15,i_class_id#16,i_class#17,i_category_id#18,i_category#19,i_manufact_id#20,i_manufact#21,i_size#22,i_formulation#23,i_color#24,i_units#25,i_container#26,i_manager_id#27,i_product_name#28] parquet, Statistics(sizeInBytes=151.1 MiB, rowCount=3.72E+5)
                           +- Project [d_date_sk#52], Statistics(sizeInBytes=8.6 KiB, rowCount=731)
                              +- Filter ((((d_year#58 >= 1999) AND (d_year#58 <= 2001)) AND isnotnull(d_year#58)) AND isnotnull(d_date_sk#52)), Statistics(sizeInBytes=175.6 KiB, rowCount=731)
                                 +- Relation[d_date_sk#52,d_date_id#53,d_date#54,d_month_seq#55,d_week_seq#56,d_quarter_seq#57,d_year#58,d_dow#59,d_moy#60,d_dom#61,d_qoy#62,d_fy_year#63,d_fy_quarter_seq#64,d_fy_week_seq#65,d_day_name#66,d_quarter_name#67,d_holiday#68,d_weekend#69,d_following_holiday#70,d_first_dom#71,d_last_dom#72,d_same_day_ly#73,d_same_day_lq#74,d_current_day#75,... 4 more fields] parquet, Statistics(sizeInBytes=17.1 MiB, rowCount=7.30E+4)
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yumwang Yuming Wang Assign to me
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment