Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.12.0
-
None
-
Windows 10, Cloudera Data Science Workbench. Base Image v13, docker.repository.cloudera.com/cdsw/engine:13
-
Patch, Important
-
ghx-label-9
Description
In the impala-shell version 2.12, the following script will got wrong result, the value of the rk1 and rk2 are totally different, but the inner join can be applied.
[code]
with
idx_dt_list as (
/* There are 304 unique index date can be found */
select distinct to_date(idx_dt) as idx_dt
from B2848551_step1_unq_pats
order by idx_dt
),
idx_dt_with_rk as (
/* assign the number 1 ~ 304 to those index dates */
select idx_dt, rank() over(order by idx_dt) as rk
from idx_dt_list
),
pat_with_random_num as (
/*
There are 31,808 patients in the table, assign a
random number 1 ~ 304 to each patients
*/
select patient_id, ceil(random(1) * 304) as rk
from B2848551_step6_nonbaom
),
wanted as (
select a.patient_id, b.idx_dt, a.rk as rk1, b.rk as rk2
from pat_with_random_num a, idx_dt_with_rk b
/* The condition is a.rk = b.rk */
where a.rk = b.rk
)
select *
from wanted
limit 5
;
[/code]
[result : the columns rk1 and rk2 are different]
----------------------------+
patient_id | idx_dt | rk1 | rk2 |
----------------------------+
xxxxxxx1 | 2022-03-22 | 127 | 295 |
xxxxxxx2 | 2021-08-29 | 140 | 90 |
xxxxxxx3 | 2021-07-17 | 183 | 47 |
xxxxxxx4 | 2021-06-08 | 87 | 8 |
xxxxxxx5 | 2021-07-17 | 295 | 47 |
----------------------------+
[/result]