Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3271

Hive : Tpch 01.q fails with a verification issue for SF100 dataset

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • 1.2.0
    • Storage - Hive
    • None

    Description

      git.commit.id.abbrev=5f26b8b

      Query :

      select
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
      from
        lineitem
      where
        l_shipdate <= date '1998-12-01' - interval '120' day (3)
      group by
        l_returnflag,
        l_linestatus
      order by
        l_returnflag,
        l_linestatus;
      

      The 4th column appears to have some differences. Not sure if it is within acceptable range

      Expected :

      A       F       3.775127758E9   5.660776097194428E12    5.377736398183942E12    5.592847429515948E12    25.499370423275426      38236.11698430475       0.05000224353079674     148047881
      N       O       7.269911583E9   1.0901214476134316E13   1.0356163586785008E13   1.077041889123738E13    25.499873337396807      38236.997134222445      0.04999763132401859     285095988
      R       F       3.77572497E9    5.661603032745362E12    5.378513563915394E12    5.593662252666902E12    25.50006628406532       38236.69725845312       0.05000130433952159     148067261
      N       F       9.8553062E7     1.4777109838597995E11   1.403849659650348E11    1.459997930327757E11    25.501556956882876      38237.19938880449       0.04998528433803118     3864590
      

      Actual :

      A       F       3.775127758E9   5.660776097194352E12    5.37773639818398E12     5.592847429515874E12    25.499370423275426      38236.11698430423       0.0500022435305286      148047881
      N       O       7.269911583E9   1.0901214476134352E13   1.0356163586784926E13   1.0770418891237576E13   25.499873337396807      38236.99713422257       0.04999763132535226     285095988
      R       F       3.77572497E9    5.661603032745394E12    5.378513563915313E12    5.593662252666848E12    25.50006628406532       38236.69725845333       0.05000130433925318     148067261
      N       F       9.8553062E7     1.4777109838598022E11   1.4038496596503506E11   1.45999793032776E11     25.501556956882876      38237.19938880456       0.049985284338093884    3864590
      

      The data is 100 GB, so I couldn't attach it here.
      I attached the hive ddl. Let me know if you need anything else

      Attachments

        1. tpch100_hive.ddl
          3 kB
          Rahul Kumar Challapalli

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vkorukanti Venki Korukanti
            rkins Rahul Kumar Challapalli
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment