Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1507

Potential hash insert issue

    XMLWordPrintableJSON

Details

    Description

      #Thu Oct 02 17:49:48 PDT 2014
      git.commit.id.abbrev=29dde76

      Running the following "case, group by, and order by" query against json file type, I saw the following hash insert errors repeatedly. The query finishes eventually after a little over 30 min, and the data returned is correct. The same query running against parquet file finishes in about a minute. Here is the query:

      /root/drillATS/incubator-drill/testing/framework/resources/aggregate1/json/testcases/aggregate26.q :
      select cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end as int) as soldd, cast(case when ss_sold_time_sk is null then 0 else ss_sold_time_sk end as bigint) as soldt, cast(case when ss_item_sk is null then 0.0 else ss_item_sk end as float) as itemsk, cast(case when ss_customer_sk is null then 0.0 else ss_customer_sk end as decimal(18,9)) as custsk, cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as varchar(20)) as cdemo, ss_hdemo_sk as hdemo, ss_addr_sk as addrsk, ss_store_sk as storesk, ss_promo_sk as promo, ss_ticket_number as tickn, sum(ss_quantity) as quantities from store_sales group by cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null then 0 else ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null then 0.0 else ss_item_sk end as float), cast(case when ss_customer_sk is null then 0.0 else ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number order by cast(case when ss_sold_date_sk is null then 0 else ss_sold_date_sk end as int), cast(case when ss_sold_time_sk is null then 0 else ss_sold_time_sk end as bigint), cast(case when ss_item_sk is null then 0.0 else ss_item_sk end as float), cast(case when ss_customer_sk is null then 0.0 else ss_customer_sk end as decimal(18,9)), cast(case when ss_cdemo_sk is null then 0 else ss_cdemo_sk end as varchar(20)), ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number limit 100

      Here is the error I saw:

      11:46:46.836 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 32768 bytes. Total Allocated: 778240
      11:46:46.848 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with new batch holder...
      .....

      11:48:49.936 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0 - Producer Thread] DEBUG o.apache.drill.exec.memory.Accountor - Fragment:0:0 Reserved 32768 bytes. Total Allocated: 778240
      11:48:49.947 [e88d1c5f-01f4-4e9a-a24f-a5601be809cf:frag:0:0] DEBUG o.a.d.e.p.impl.common.HashTable - Put into hash table failed .. Retrying with new batch holder...

      The data is tpcds and converted into json using drill's json writer. Since eventually the query completes and passes data verification, the json writer is probably converting parquet to json correctly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            cchang@maprtech.com Chun Chang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: