Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9741

[Rust] [DataFusion] Incorrect count in TPC-H query 1 result set

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • None
    • 2.0.0
    • Rust, Rust - DataFusion
    • None

    Description

      I am testing with the 100 GB (scale factor 100) data set, in Parquet format. The results overall match between Spark and DataFusion with the exception of one of the counts (Spark has 291241911 and DataFusion has 300058170 .. a difference of 8816259).

       

      DataFusion query:

      "select
          l_returnflag,
          l_linestatus,
          sum(l_quantity),
          sum(l_extendedprice),
          sum(l_extendedprice * (1 - l_discount)),
          sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
          avg(l_quantity),
          avg(l_extendedprice),
          avg(l_discount),
          count(*)
      from
          lineitem
      where
          l_shipdate <= '1998-12-01'
      group by
          l_returnflag,
          l_linestatus
      order by
          l_returnflag,
          l_linestatus" 

      DataFusion output:

      +--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+
      | l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | sum(l_extendedprice Multiply CAST(Int64(1) as Float64) Minus l_discount) | sum(l_extendedprice Multiply CAST(Int64(1) as Float64) Minus l_discount Multiply CAST(Int64(1) as Float64) Plus l_tax) | avg(l_quantity)    | avg(l_extendedprice) | avg(l_discount)      | count(UInt8(1)) |
      +--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+
      | A            | F            | 3775127758      | 5660776097194.464    | 5377736398183.935                                                        | 5592847429515.93                                                                                                       | 25.49937060623502  | 38236.11838745711    | 0.05000224145223291  | 148047881       |
      | N            | F            | 98553062        | 147771098385.98004   | 140384965965.03473                                                       | 145999793032.77594                                                                                                     | 25.501475096542002 | 38237.03209968505    | 0.0499850931498342   | 3864590         |
      | N            | O            | 7651423419      | 11473321691083.244   | 10899667121317.215                                                       | 11335664103186.313                                                                                                     | 25.499799813085986 | 38236.99077003657    | 0.04999757591275955  | 300058170       |
      | R            | F            | 3775724970      | 5661603032745.35     | 5378513563915.415                                                        | 5593662252666.921                                                                                                      | 25.500067651651772 | 38236.70005754084    | 0.050001305269911714 | 148067261       |
      +--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+
       

      Spark query:

      | select
      |     l_returnflag,
      |     l_linestatus,
      |     sum(l_quantity) as sum_qty,
      |     sum(l_extendedprice) as sum_base_price,
      |     sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
      |     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
      |     avg(l_quantity) as avg_qty,
      |     avg(l_extendedprice) as avg_price,
      |     avg(l_discount) as avg_disc,
      |     count(*) as count_order
      | from
      |     lineitem
      | where
      |     l_shipdate < '1998-09-01'
      | group by
      |     l_returnflag,
      |     l_linestatus
      | order by
      |     l_returnflag,
      |     l_linestatus 

      Spark output:

      +------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+
      |l_returnflag|l_linestatus|      sum_qty|      sum_base_price|      sum_disc_price|          sum_charge|           avg_qty|         avg_price|            avg_disc|count_order|
      +------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+
      |           A|           F|3.775127758E9|5.660776097194467E12|5.377736398183933E12|5.592847429515929E12|25.499370423275426| 38236.11698430501| 0.05000224353093977|  148047881|
      |           N|           F|  9.8553062E7|1.477710983859800...|1.403849659650348E11|1.459997930327758...|25.501556956882876|38237.199388804525|0.049985284338051286|    3864590|
      |           N|           O|7.426674812E9|1.113628734444901...|1.057947943676070...|1.100266737949706...| 25.50002088126664|38237.241701277715| 0.04999786229238074|  291241911|
      |           R|           F| 3.77572497E9|5.661603032745349E12|5.378513563915412E12|5.593662252666918E12| 25.50006628406532| 38236.69725845302|0.050001304339664904|  148067261|
      +------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+
       

       

      Attachments

        Activity

          People

            andygrove Andy Grove
            andygrove Andy Grove
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: