Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23165

Hive On Spark left join and right join generated inconsistent data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0
    • None
    • Hive
    • hive :2.3.0

      spark:2.2.0

      hadoop:2.7.3

    Description

      1)This is my sql.

      with delivery_day as (
      select * from (
      select dt,warehouse_code,b.sku_main_code,b.out_warehouse_code,b.is_pici_order
      from data_smartorder.dm_ordering_information_system_order_detail_parse t
      lateral view json_tuple(t.information_info,'warehouse_code','sku_main_code','调出仓','是否预付商品')b as warehouse_code,sku_main_code,out_warehouse_code,is_pici_order
      where dt=date_format(date_sub(current_date,1),'yyyyMMdd')
      and l1_category_name='策略配置'
      and l2_category_name='pb仓库补货仓品维度新'
      and b.is_pici_order='1'
      )t
      ),

      avg_sale_7 as (
      select *,sku_sale_quantity+first_dilivery_quantity as avg_sale_7
      from (
      select t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name sku_main_name,
      sum(t1.warehouse_dispatch_quantity) as warehouse_dispatch_quantity,
      sum(t1.sku_sale_quantity) as sku_sale_quantity,
      sum(t1.first_dilivery_quantity) as first_dilivery_quantity
      from data_smartorder.dw_ordering_warehouse_sku_cargo_delivery_data_di t1
      where t1.dt=date_format(date_sub(current_date,1),'yyyyMMdd')
      group by t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name
      )t
      )

      select t1.warehouse_code,t1.sku_main_code,t1.out_warehouse_code,
      t2.avg_sale_7
      from delivery_day t1
      left join avg_sale_7 t2
      on t1.warehouse_code=t2.warehouse_code
      and t1.sku_main_code=t2.sku_main_code
      where t1.sku_main_code='37010832'
      and t1.out_warehouse_code='1011';

      left join and right join generated inconsistent data.

      2) result in the left join
      7001 37010832 1011 26.8572
      1011 37010832 1011 130.2858
      2002 37010832 1011 40
      1701 37010832 1011 NULL

      3) result in the right join
      1011 37010832 1011 65.1429
      2002 37010832 1011 20
      7001 37010832 1011 13.4286

      Inconsistent results in last column,'right join' 's result is right.But the results of hive on tez and sparksql are consistent and is true.

      Attachments

        Activity

          People

            xuefuz Xuefu Zhang
            qingfa zhou qingfa zhou
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified