Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
None
-
None
Description
I am testing with the 100 GB (scale factor 100) data set, in Parquet format. The results overall match between Spark and DataFusion with the exception of one of the counts (Spark has 291241911 and DataFusion has 300058170 .. a difference of 8816259).
DataFusion query:
"select
l_returnflag,
l_linestatus,
sum(l_quantity),
sum(l_extendedprice),
sum(l_extendedprice * (1 - l_discount)),
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
avg(l_quantity),
avg(l_extendedprice),
avg(l_discount),
count(*)
from
lineitem
where
l_shipdate <= '1998-12-01'
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus"
DataFusion output:
+--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+ | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | sum(l_extendedprice Multiply CAST(Int64(1) as Float64) Minus l_discount) | sum(l_extendedprice Multiply CAST(Int64(1) as Float64) Minus l_discount Multiply CAST(Int64(1) as Float64) Plus l_tax) | avg(l_quantity) | avg(l_extendedprice) | avg(l_discount) | count(UInt8(1)) | +--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+ | A | F | 3775127758 | 5660776097194.464 | 5377736398183.935 | 5592847429515.93 | 25.49937060623502 | 38236.11838745711 | 0.05000224145223291 | 148047881 | | N | F | 98553062 | 147771098385.98004 | 140384965965.03473 | 145999793032.77594 | 25.501475096542002 | 38237.03209968505 | 0.0499850931498342 | 3864590 | | N | O | 7651423419 | 11473321691083.244 | 10899667121317.215 | 11335664103186.313 | 25.499799813085986 | 38236.99077003657 | 0.04999757591275955 | 300058170 | | R | F | 3775724970 | 5661603032745.35 | 5378513563915.415 | 5593662252666.921 | 25.500067651651772 | 38236.70005754084 | 0.050001305269911714 | 148067261 | +--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+
Spark query:
| select
| l_returnflag,
| l_linestatus,
| sum(l_quantity) as sum_qty,
| sum(l_extendedprice) as sum_base_price,
| sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
| sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
| avg(l_quantity) as avg_qty,
| avg(l_extendedprice) as avg_price,
| avg(l_discount) as avg_disc,
| count(*) as count_order
| from
| lineitem
| where
| l_shipdate < '1998-09-01'
| group by
| l_returnflag,
| l_linestatus
| order by
| l_returnflag,
| l_linestatus
Spark output:
+------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+ |l_returnflag|l_linestatus| sum_qty| sum_base_price| sum_disc_price| sum_charge| avg_qty| avg_price| avg_disc|count_order| +------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+ | A| F|3.775127758E9|5.660776097194467E12|5.377736398183933E12|5.592847429515929E12|25.499370423275426| 38236.11698430501| 0.05000224353093977| 148047881| | N| F| 9.8553062E7|1.477710983859800...|1.403849659650348E11|1.459997930327758...|25.501556956882876|38237.199388804525|0.049985284338051286| 3864590| | N| O|7.426674812E9|1.113628734444901...|1.057947943676070...|1.100266737949706...| 25.50002088126664|38237.241701277715| 0.04999786229238074| 291241911| | R| F| 3.77572497E9|5.661603032745349E12|5.378513563915412E12|5.593662252666918E12| 25.50006628406532| 38236.69725845302|0.050001304339664904| 148067261| +------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+