Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1562

Parquet Writer hangs when converting TPCH text data (SF100)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • Storage - Parquet
    • None

    Description

      Converting TPCH text data into Parquet hangs.

      Table name: lineitem
      Table size: ~80GB
      Input format: psv ('|' separated)

      Number of drillbits: 4
      DRILL_MAX_DIRECT_MEMORY="64G"
      DRILL_MAX_HEAP="32G"

      Query:
      > create table lineitem as select
      . . . . . . . . . . . . . . . . . > cast(columns[0] as int) l_orderkey,
      . . . . . . . . . . . . . . . . . > cast(columns[1] as int) l_partkey,
      . . . . . . . . . . . . . . . . . > cast(columns[2] as int) l_suppkey,
      . . . . . . . . . . . . . . . . . > cast(columns[3] as int) l_linenumber,
      . . . . . . . . . . . . . . . . . > cast(columns[4] as double) l_quantity,
      . . . . . . . . . . . . . . . . . > cast(columns[5] as double) l_extendedprice,
      . . . . . . . . . . . . . . . . . > cast(columns[6] as double) l_discount,
      . . . . . . . . . . . . . . . . . > cast(columns[7] as double) l_tax,
      . . . . . . . . . . . . . . . . . > cast(columns[8] as char(1)) l_returnflag,
      . . . . . . . . . . . . . . . . . > cast(columns[9] as char(1)) l_linestatus,
      . . . . . . . . . . . . . . . . . > cast(columns[10] as date) l_shipdate,
      . . . . . . . . . . . . . . . . . > cast(columns[11] as date) l_commitdate,
      . . . . . . . . . . . . . . . . . > cast(columns[12] as date) l_receiptdate,
      . . . . . . . . . . . . . . . . . > cast(columns[13] as char(25)) l_shipinstruct,
      . . . . . . . . . . . . . . . . . > cast(columns[14] as char(10)) l_shipmode,
      . . . . . . . . . . . . . . . . . > cast(columns[15] as varchar(200)) l_comment
      . . . . . . . . . . . . . . . . . > from dfs.`/tpch-text/scale100/lineitem` lineitem;
      -------------------------------------+

      Fragment Number of records written

      -------------------------------------+

      1_58 4072947
      1_90 4088667
      1_38 4072639

      ...
      ...

      1_14 6109440

      <hangs>
      ...

      The drill-bit endpoint gets set to null. And the point of hang varies on each run.

      Attachments

        1. hang.log
          22 kB
          Abhishek Girish

        Activity

          People

            parthc Parth Chandra
            agirish Abhishek Girish
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: