Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3208

Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is backed by hive tables

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • 1.1.0
    • Storage - Hive
    • None

    Description

      git.commit.id.abbrev=6f54223

      I created hive tables on top of tpch parquet data. (Attached the hive ddl script). Since hive does not support date in parquet serde, I regenerated the parquet files for orders and lineitem to use string for the date fields. Remaining files do not have a date column.

      When I executed query 10 in the tpch suite, it failed with a system error.

      0: jdbc:drill:schema=dfs_eea> use hive.tpch01_parquet_nodate;
      +-------+---------------------------------------------------------+
      |  ok   |                         summary                         |
      +-------+---------------------------------------------------------+
      | true  | Default schema changed to [hive.tpch01_parquet_nodate]  |
      +-------+---------------------------------------------------------+
      1 row selected (0.091 seconds)
      0: jdbc:drill:schema=dfs_eea>
      select
        c.c_custkey,
        c.c_name,
        sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
        c.c_acctbal,
        n.n_name,
        c.c_address,
        c.c_phone,
        c.c_comment
      from
        customer c,
        orders o,
        lineitem l,
        nation n
      where
        c.c_custkey = o.o_custkey
        and l.l_orderkey = o.o_orderkey
        and cast(o.o_orderdate as date) >= date '1994-03-01'
        and cast(o.o_orderdate as date) < date '1994-03-01' + interval '3' month
        and l.l_returnflag = 'R'
        and c.c_nationkey = n.n_nationkey
      group by
        c.c_custkey,
        c.c_name,
        c.c_acctbal,
        c.c_phone,
        n.n_name,
        c.c_address,
        c.c_comment
      order by
        revenue desc
      limit 20;
      
      Error: SYSTEM ERROR: 
      
      Fragment 0:0
      
      [Error Id: 1d327ae0-1cf2-4776-acd3-8eef6cca4b6a on qa-node191.qa.lab:31010] (state=,code=0)
      

      I tried running the above query using dfs instead of hive and it worked as expected.

      I attached the newly generated parquet files and the hive ddl for creating hive tables. Let me know if you need anything

      Attachments

        1. customer.parquet
          130 kB
          Rahul Kumar Challapalli
        2. nation.parquet
          2 kB
          Rahul Kumar Challapalli
        3. error.log
          8 kB
          Rahul Kumar Challapalli
        4. lineitem_nodate.parquet
          2.92 MB
          Rahul Kumar Challapalli
        5. orders_nodate.parquet
          663 kB
          Rahul Kumar Challapalli
        6. tpch.ddl
          3 kB
          Rahul Kumar Challapalli

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vkorukanti Venki Korukanti
            rkins Rahul Kumar Challapalli
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment