Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.17.0
-
None
-
None
Description
Facing one issue while creating Parquet file in Drill from database.
Summary-
I am creating one Parquet file from database using CTAS. But getting error while I am creating with PARTITION BY "External Sort encountered an error while spilling to disk" . The details of the error is given below.
Version of Apache Drill -
1.17
Memory config-
DRILL_HEAP=16 G
DRILL_MAX_DIRECT_MEMORY=32G
Few configs are mentioned here for information-
store.parquet.reader.pagereader.async=true;
store.parquet.reader.pagereader.bufferedread=false;
planner.memory.max_query_memory_per_node=31147483648
drill.exec.memory.operator.output_batch_size=4194304
Details of volume-
The number of rows for which I am trying to CTAS PARTITION BY - 14424482. No of columns 145.
There are 3 columns in Partition By clause.
Python produced less than 1 GB Parquet from the same dataset in SNAPPY compression.
I am able to create Parquet using CTAS without PARTITION BY.
CTAS script-
CREATE TABLE dfs.root.<Table_name>
PARTITION BY (<Column1>,<Column2>,<Column3>)
AS SELECT *
FROM db.<Table>;
Error Log-
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: External Sort encountered an error while spilling to disk
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No space left on device