Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21934

Materialized view on top of Druid not pushing everything

    XMLWordPrintableJSON

Details

    Description

      The title is not very informative, but examples hopefully are.

      this is the plan with the view

      
      
      explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
      CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
      SUM(1) AS `sum_number_of_records_ok`,
      YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
      FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
      JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
      JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
      JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
      JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
      GROUP BY MONTH(`dates_n1`.`__time`),
      CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
      YEAR(`dates_n1`.`__time`)
      INFO : Starting task [Stage-3:EXPLAIN] in serial mode
      INFO : Completed executing command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); Time taken: 0.005 seconds
      INFO : OK
      +----------------------------------------------------+
      | Explain |
      +----------------------------------------------------+
      | Plan optimized by CBO. |
      | |
      | Vertex dependency in root stage |
      | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
      | |
      | Stage-0 |
      | Fetch Operator |
      | limit:-1 |
      | Stage-1 |
      | Reducer 2 vectorized, llap |
      | File Output Operator [FS_13] |
      | Select Operator [SEL_12] (rows=300018951 width=38) |
      | Output:["_col0","_col1","_col2","_col3"] |
      | Group By Operator [GBY_11] (rows=300018951 width=38) |
      | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1, KEY._col2 |
      | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
      | SHUFFLE [RS_10] |
      | PartitionCols:_col0, _col1, _col2 |
      | Group By Operator [GBY_9] (rows=600037902 width=38) |
      | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, _col1, _col2 |
      | Select Operator [SEL_8] (rows=600037902 width=38) |
      | Output:["_col0","_col1","_col2"] |
      | TableScan [TS_0] (rows=600037902 width=38) |
      | mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} |
      | |
      +----------------------------------------------------+
      
       
      
      

      if i use a simple druid table without MV 

      
      
      explain SELECT MONTH(`__time`) AS `mn___time_ok`,
      CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
      SUM(1) AS `sum_number_of_records_ok`,
      YEAR(`__time`) AS `yr___time_ok`
      FROM `druid_ssb.ssb_druid_100`
      GROUP BY MONTH(`__time`),
      CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
      YEAR(`__time`);
      
      
      
      +----------------------------------------------------+
      | Explain |
      +----------------------------------------------------+
      | Plan optimized by CBO. |
      | |
      | Stage-0 |
      | Fetch Operator |
      | limit:-1 |
      | Select Operator [SEL_1] |
      | Output:["_col0","_col1","_col2","_col3"] |
      | TableScan [TS_0] |
      | Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York') - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), 'LONG')\",\"outputType\":\"LONG\"}],\"limitSpec\":\{\"type\":\"default\"},\"aggregations\":[\{\"type\":\"longSum\",\"name\":\"$f3\",\"expression\":\"1\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"} |
      | |
      +----------------------------------------------------+
      
      

      Attachments

        1. HIVE-21934.patch
          46 kB
          Jesus Camacho Rodriguez

        Issue Links

          Activity

            People

              jcamachorodriguez Jesus Camacho Rodriguez
              bslim Slim Bouguerra
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h