Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22180

Same query but different result occasionally

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.0
    • None
    • Hive
    • None
    • hadoop:2.7.3.2.6.1.0-129

      hive:2.3.0

      tez:0.8.4

      java:1.8.0_212 

    Description

      1.Phenomenon

          Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively

          1)

           OK

           3951864 808

          2)

           OK

           1822979 353

          But the first result is true.

      2.SQL

      select
      count(1),count(distinct store_code)
      from
      (
      select
      store_code,
      all_pay_id,
      payable_price,
      row_number()over(partition by store_code order by payable_price desc) as rw,
      count(1)over(partition by store_code) as store_user
      from
      (
      select
      store_code,
      all_pay_id,
      sum(payable_price) as payable_price
      from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
      where dt='20190904'
      group by store_code,all_pay_id
      ) a
      order by rw
      ) a
      where rw/store_user<=0.8
      ;

       

      3.Troubleshooting

          I lookup task result from tez web ui,give the result as follows:

                 the first:

              the second:

            so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first.

           This is detailed info of the second result from reducer_3 to reducer_4 

       

      This is execution plan of the sql:

      OK
      Vertex dependency in root stage
      Reducer 2 <- Map 1 (SIMPLE_EDGE)
      Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
      Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
      Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
      Reducer 6 <- Reducer 5 (SIMPLE_EDGE)

      Stage-0
      Fetch Operator
      limit:-1
      Stage-1
      Reducer 6
      File Output Operator [FS_24]
      Group By Operator [GBY_22] (rows=1 width=128)
      Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"]
      <-Reducer 5 [SIMPLE_EDGE]
      SHUFFLE [RS_21]
      Group By Operator [GBY_20] (rows=5619870 width=2328)
      Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0
      Select Operator [SEL_17] (rows=5619870 width=2328)
      Output:["_col0"]
      <-Reducer 4 [SIMPLE_EDGE]
      SHUFFLE [RS_16]
      Select Operator [SEL_14] (rows=5619870 width=2328)
      Output:["_col0","_col3"]
      Filter Operator [FIL_25] (rows=5619870 width=2328)
      predicate(_col0 / count_window_1) <= 0.8)
      PTF Operator [PTF_13] (rows=16859610 width=2328)
      Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}]
      Select Operator [SEL_12] (rows=16859610 width=2328)
      Output:["_col0","_col1"]
      <-Reducer 3 [SIMPLE_EDGE]
      SHUFFLE [RS_11]
      PartitionCols:_col0
      Select Operator [SEL_10] (rows=16859610 width=2328)
      Output:["_col0","row_number_window_0"]
      PTF Operator [PTF_9] (rows=16859610 width=2328)
      Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}]
      Select Operator [SEL_8] (rows=16859610 width=2328)
      Output:["_col0","_col2"]
      <-Reducer 2 [SIMPLE_EDGE]
      SHUFFLE [RS_7]
      PartitionCols:_col0
      Select Operator [SEL_6] (rows=16859610 width=2328)
      Output:["_col0","_col2"]
      Group By Operator [GBY_5] (rows=16859610 width=2328)
      Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1
      <-Map 1 [SIMPLE_EDGE]
      SHUFFLE [RS_4]
      PartitionCols:_col0, _col1
      Group By Operator [GBY_3] (rows=33719220 width=2328)
      Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id
      Select Operator [SEL_2] (rows=33719220 width=2328)
      Output:["store_code","all_pay_id","payable_price"]
      TableScan [TS_0] (rows=33719220 width=2328)
      data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]

       

          Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me.

       

       

      Attachments

        1. image-2019-09-10-12-26-41-681.png
          280 kB
          qingfa zhou
        2. image-2019-09-10-12-18-08-769.png
          81 kB
          qingfa zhou
        3. image-2019-09-10-12-17-34-779.png
          160 kB
          qingfa zhou
        4. image-2019-09-10-12-16-45-385.png
          77 kB
          qingfa zhou

        Activity

          People

            Unassigned Unassigned
            qingfa zhou qingfa zhou
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: