Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3208

Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is backed by hive tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • 1.1.0
    • Storage - Hive
    • None

    Description

      git.commit.id.abbrev=6f54223

      I created hive tables on top of tpch parquet data. (Attached the hive ddl script). Since hive does not support date in parquet serde, I regenerated the parquet files for orders and lineitem to use string for the date fields. Remaining files do not have a date column.

      When I executed query 10 in the tpch suite, it failed with a system error.

      0: jdbc:drill:schema=dfs_eea> use hive.tpch01_parquet_nodate;
      +-------+---------------------------------------------------------+
      |  ok   |                         summary                         |
      +-------+---------------------------------------------------------+
      | true  | Default schema changed to [hive.tpch01_parquet_nodate]  |
      +-------+---------------------------------------------------------+
      1 row selected (0.091 seconds)
      0: jdbc:drill:schema=dfs_eea>
      select
        c.c_custkey,
        c.c_name,
        sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
        c.c_acctbal,
        n.n_name,
        c.c_address,
        c.c_phone,
        c.c_comment
      from
        customer c,
        orders o,
        lineitem l,
        nation n
      where
        c.c_custkey = o.o_custkey
        and l.l_orderkey = o.o_orderkey
        and cast(o.o_orderdate as date) >= date '1994-03-01'
        and cast(o.o_orderdate as date) < date '1994-03-01' + interval '3' month
        and l.l_returnflag = 'R'
        and c.c_nationkey = n.n_nationkey
      group by
        c.c_custkey,
        c.c_name,
        c.c_acctbal,
        c.c_phone,
        n.n_name,
        c.c_address,
        c.c_comment
      order by
        revenue desc
      limit 20;
      
      Error: SYSTEM ERROR: 
      
      Fragment 0:0
      
      [Error Id: 1d327ae0-1cf2-4776-acd3-8eef6cca4b6a on qa-node191.qa.lab:31010] (state=,code=0)
      

      I tried running the above query using dfs instead of hive and it worked as expected.

      I attached the newly generated parquet files and the hive ddl for creating hive tables. Let me know if you need anything

      Attachments

        1. customer.parquet
          130 kB
          Rahul Kumar Challapalli
        2. nation.parquet
          2 kB
          Rahul Kumar Challapalli
        3. error.log
          8 kB
          Rahul Kumar Challapalli
        4. lineitem_nodate.parquet
          2.92 MB
          Rahul Kumar Challapalli
        5. orders_nodate.parquet
          663 kB
          Rahul Kumar Challapalli
        6. tpch.ddl
          3 kB
          Rahul Kumar Challapalli

        Issue Links

          Activity

            People

              vkorukanti Venki Korukanti
              rkins Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: