Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
3.0.1
-
None
-
None
Description
We have a SQL
INSERT OVERWRITE TABLE t1
SELECT /*+ repartition(300) */ * from t2.
Below is SQL metrics of the repartition ShuffleExchange. we can see that the shuffle record written and records read is not same.
In the result table, there are some data missing and some data duplicated.
We can see that InsertIntoHadoopFsRelationCommand's output is save as repartition Exchange's record read(reducer side)
and repartition Exchange's shuffle record written (mapper side written) is same as Filter's output.
So we can see that repartition's Exchange return wrong data.
In our env, AQE and speculation is open.