Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3271

Hive : Tpch 01.q fails with a verification issue for SF100 dataset

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: Storage - Hive
    • Labels:
      None

      Description

      git.commit.id.abbrev=5f26b8b

      Query :

      select
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
      from
        lineitem
      where
        l_shipdate <= date '1998-12-01' - interval '120' day (3)
      group by
        l_returnflag,
        l_linestatus
      order by
        l_returnflag,
        l_linestatus;
      

      The 4th column appears to have some differences. Not sure if it is within acceptable range

      Expected :

      A       F       3.775127758E9   5.660776097194428E12    5.377736398183942E12    5.592847429515948E12    25.499370423275426      38236.11698430475       0.05000224353079674     148047881
      N       O       7.269911583E9   1.0901214476134316E13   1.0356163586785008E13   1.077041889123738E13    25.499873337396807      38236.997134222445      0.04999763132401859     285095988
      R       F       3.77572497E9    5.661603032745362E12    5.378513563915394E12    5.593662252666902E12    25.50006628406532       38236.69725845312       0.05000130433952159     148067261
      N       F       9.8553062E7     1.4777109838597995E11   1.403849659650348E11    1.459997930327757E11    25.501556956882876      38237.19938880449       0.04998528433803118     3864590
      

      Actual :

      A       F       3.775127758E9   5.660776097194352E12    5.37773639818398E12     5.592847429515874E12    25.499370423275426      38236.11698430423       0.0500022435305286      148047881
      N       O       7.269911583E9   1.0901214476134352E13   1.0356163586784926E13   1.0770418891237576E13   25.499873337396807      38236.99713422257       0.04999763132535226     285095988
      R       F       3.77572497E9    5.661603032745394E12    5.378513563915313E12    5.593662252666848E12    25.50006628406532       38236.69725845333       0.05000130433925318     148067261
      N       F       9.8553062E7     1.4777109838598022E11   1.4038496596503506E11   1.45999793032776E11     25.501556956882876      38237.19938880456       0.049985284338093884    3864590
      

      The data is 100 GB, so I couldn't attach it here.
      I attached the hive ddl. Let me know if you need anything else

        Attachments

        1. tpch100_hive.ddl
          3 kB
          Rahul Challapalli

          Activity

            People

            • Assignee:
              vkorukanti Venki Korukanti
              Reporter:
              rkins Rahul Challapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: