Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3208

Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is backed by hive tables

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: Storage - Hive
    • Labels:
      None

      Description

      git.commit.id.abbrev=6f54223

      I created hive tables on top of tpch parquet data. (Attached the hive ddl script). Since hive does not support date in parquet serde, I regenerated the parquet files for orders and lineitem to use string for the date fields. Remaining files do not have a date column.

      When I executed query 10 in the tpch suite, it failed with a system error.

      0: jdbc:drill:schema=dfs_eea> use hive.tpch01_parquet_nodate;
      +-------+---------------------------------------------------------+
      |  ok   |                         summary                         |
      +-------+---------------------------------------------------------+
      | true  | Default schema changed to [hive.tpch01_parquet_nodate]  |
      +-------+---------------------------------------------------------+
      1 row selected (0.091 seconds)
      0: jdbc:drill:schema=dfs_eea>
      select
        c.c_custkey,
        c.c_name,
        sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
        c.c_acctbal,
        n.n_name,
        c.c_address,
        c.c_phone,
        c.c_comment
      from
        customer c,
        orders o,
        lineitem l,
        nation n
      where
        c.c_custkey = o.o_custkey
        and l.l_orderkey = o.o_orderkey
        and cast(o.o_orderdate as date) >= date '1994-03-01'
        and cast(o.o_orderdate as date) < date '1994-03-01' + interval '3' month
        and l.l_returnflag = 'R'
        and c.c_nationkey = n.n_nationkey
      group by
        c.c_custkey,
        c.c_name,
        c.c_acctbal,
        c.c_phone,
        n.n_name,
        c.c_address,
        c.c_comment
      order by
        revenue desc
      limit 20;
      
      Error: SYSTEM ERROR: 
      
      Fragment 0:0
      
      [Error Id: 1d327ae0-1cf2-4776-acd3-8eef6cca4b6a on qa-node191.qa.lab:31010] (state=,code=0)
      

      I tried running the above query using dfs instead of hive and it worked as expected.

      I attached the newly generated parquet files and the hive ddl for creating hive tables. Let me know if you need anything

        Attachments

        1. tpch.ddl
          3 kB
          Rahul Challapalli
        2. orders_nodate.parquet
          663 kB
          Rahul Challapalli
        3. nation.parquet
          2 kB
          Rahul Challapalli
        4. lineitem_nodate.parquet
          2.92 MB
          Rahul Challapalli
        5. error.log
          8 kB
          Rahul Challapalli
        6. customer.parquet
          130 kB
          Rahul Challapalli

          Issue Links

            Activity

              People

              • Assignee:
                vkorukanti Venki Korukanti
                Reporter:
                rkins Rahul Challapalli
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: