Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30349

The result is wrong when joining tables with selecting the same columns

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.4.4
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:

      Description

      // code placeholder
      with tmp as(
      select
      log_date,
      buvid,
      manga_id,
      sum(readtime) readtime
      from
      manga.dwd_app_readtime_xt_dt
      where
      log_date >= 20191220
      group by
      log_date,
      buvid,
      manga_id
      )
      select
      t.log_date,
      GET_JSON_OBJECT(t.extended_fields, '$.type'),
      count(distinct t.buvid),
      count(distinct t0.buvid),
      count(distinct t1.buvid),
      count(distinct t2.buvid),
      count(
      distinct case
      when t1.buvid = t0.buvid then t1.buvid
      end
      ),
      count(
      distinct case
      when t1.buvid = t0.buvid
      and t1.buvid = t2.buvid then t1.buvid
      end
      ),
      count(
      distinct case
      when t0.buvid = t2.buvid then t0.buvid
      end
      ),
      sum(readtime),
      avg(readtime),
      sum(
      case
      when t0.buvid = t3.buvid then readtime
      end
      ),
      avg(
      case
      when t0.buvid = t3.buvid then readtime
      end
      )
      from
      manga.manga_tfc_app_ubt_d t
      join manga.manga_tfc_app_ubt_d t1 on t.buvid = t1.buvid
      and t1.log_date >= 20191220
      and t1.event_id = 'manga.manga-detail.0.0.pv'
      and to_date(t.stime) = TO_DATE(t1.stime)
      and GET_JSON_OBJECT(t1.extended_fields, '$.manga_id') = GET_JSON_OBJECT(t.extended_fields, '$.manga_id')
      left join manga.manga_buvid_minlog t0 on t.buvid = t0.buvid
      and t0.log_date = 20191223
      and t0.minlog >= '2019-12-20'
      and to_date(t.stime) = TO_DATE(t0.minlog)
      left join manga.dwb_tfc_app_launch_df t2 on t.buvid = t2.buvid
      and t2.log_date >= 20191220
      and DATE_ADD(to_date(t.stime), 1) = to_date(t2.stime)
      left join tmp t3 on t1.buvid = t3.buvid
      and t3.log_date >= 20191220
      and t3.manga_id = GET_JSON_OBJECT(t.extended_fields, '$.manga_id')
      where
      t.log_date >= 20191220
      and t.event_id = 'manga.homepage-recommend.detail.0.click'
      group by
      t.log_date,
      GET_JSON_OBJECT(t.extended_fields, '$.type')
      
      

      The result of hive 2.3 is ok

        Attachments

        1. screenshot-1.png
          41 kB
          cen yuhai
        2. screenshot-2.png
          44 kB
          cen yuhai

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              cenyuhai cen yuhai
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: