Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21934

Materialized view on top of Druid not pushing everything

Log workAgile BoardRank to TopRank to BottomVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      The title is not very informative, but examples hopefully are.

      this is the plan with the view

      
      
      explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
      CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
      SUM(1) AS `sum_number_of_records_ok`,
      YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
      FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
      JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
      JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
      JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
      JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
      GROUP BY MONTH(`dates_n1`.`__time`),
      CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
      YEAR(`dates_n1`.`__time`)
      INFO : Starting task [Stage-3:EXPLAIN] in serial mode
      INFO : Completed executing command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); Time taken: 0.005 seconds
      INFO : OK
      +----------------------------------------------------+
      | Explain |
      +----------------------------------------------------+
      | Plan optimized by CBO. |
      | |
      | Vertex dependency in root stage |
      | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
      | |
      | Stage-0 |
      | Fetch Operator |
      | limit:-1 |
      | Stage-1 |
      | Reducer 2 vectorized, llap |
      | File Output Operator [FS_13] |
      | Select Operator [SEL_12] (rows=300018951 width=38) |
      | Output:["_col0","_col1","_col2","_col3"] |
      | Group By Operator [GBY_11] (rows=300018951 width=38) |
      | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1, KEY._col2 |
      | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
      | SHUFFLE [RS_10] |
      | PartitionCols:_col0, _col1, _col2 |
      | Group By Operator [GBY_9] (rows=600037902 width=38) |
      | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, _col1, _col2 |
      | Select Operator [SEL_8] (rows=600037902 width=38) |
      | Output:["_col0","_col1","_col2"] |
      | TableScan [TS_0] (rows=600037902 width=38) |
      | mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} |
      | |
      +----------------------------------------------------+
      
       
      
      

      if i use a simple druid table without MV 

      
      
      explain SELECT MONTH(`__time`) AS `mn___time_ok`,
      CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
      SUM(1) AS `sum_number_of_records_ok`,
      YEAR(`__time`) AS `yr___time_ok`
      FROM `druid_ssb.ssb_druid_100`
      GROUP BY MONTH(`__time`),
      CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
      YEAR(`__time`);
      
      
      
      +----------------------------------------------------+
      | Explain |
      +----------------------------------------------------+
      | Plan optimized by CBO. |
      | |
      | Stage-0 |
      | Fetch Operator |
      | limit:-1 |
      | Select Operator [SEL_1] |
      | Output:["_col0","_col1","_col2","_col3"] |
      | TableScan [TS_0] |
      | Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York') - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), 'LONG')\",\"outputType\":\"LONG\"}],\"limitSpec\":\{\"type\":\"default\"},\"aggregations\":[\{\"type\":\"longSum\",\"name\":\"$f3\",\"expression\":\"1\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"} |
      | |
      +----------------------------------------------------+
      
      

        Attachments

        1. HIVE-21934.patch
          46 kB
          Jesus Camacho Rodriguez

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              jcamachorodriguez Jesus Camacho Rodriguez Assign to me
              Reporter:
              bslim Slim Bouguerra

              Dates

              • Created:
                Updated:
                Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 0.5h
              0.5h

                Issue deployment