Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Converting TPCH text data into Parquet hangs.
Table name: lineitem
Table size: ~80GB
Input format: psv ('|' separated)
Number of drillbits: 4
DRILL_MAX_DIRECT_MEMORY="64G"
DRILL_MAX_HEAP="32G"
Query:
> create table lineitem as select
. . . . . . . . . . . . . . . . . > cast(columns[0] as int) l_orderkey,
. . . . . . . . . . . . . . . . . > cast(columns[1] as int) l_partkey,
. . . . . . . . . . . . . . . . . > cast(columns[2] as int) l_suppkey,
. . . . . . . . . . . . . . . . . > cast(columns[3] as int) l_linenumber,
. . . . . . . . . . . . . . . . . > cast(columns[4] as double) l_quantity,
. . . . . . . . . . . . . . . . . > cast(columns[5] as double) l_extendedprice,
. . . . . . . . . . . . . . . . . > cast(columns[6] as double) l_discount,
. . . . . . . . . . . . . . . . . > cast(columns[7] as double) l_tax,
. . . . . . . . . . . . . . . . . > cast(columns[8] as char(1)) l_returnflag,
. . . . . . . . . . . . . . . . . > cast(columns[9] as char(1)) l_linestatus,
. . . . . . . . . . . . . . . . . > cast(columns[10] as date) l_shipdate,
. . . . . . . . . . . . . . . . . > cast(columns[11] as date) l_commitdate,
. . . . . . . . . . . . . . . . . > cast(columns[12] as date) l_receiptdate,
. . . . . . . . . . . . . . . . . > cast(columns[13] as char(25)) l_shipinstruct,
. . . . . . . . . . . . . . . . . > cast(columns[14] as char(10)) l_shipmode,
. . . . . . . . . . . . . . . . . > cast(columns[15] as varchar(200)) l_comment
. . . . . . . . . . . . . . . . . > from dfs.`/tpch-text/scale100/lineitem` lineitem;
-------------------------------------+
Fragment | Number of records written |
-------------------------------------+
1_58 | 4072947 |
1_90 | 4088667 |
1_38 | 4072639 |
...
...
1_14 | 6109440 |
<hangs>
...
The drill-bit endpoint gets set to null. And the point of hang varies on each run.