Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7737

Error while creating Parquet from database : External Sort encountered an error while spilling to disk

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • Functions - Drill
    • None

    Description

      Facing one issue while creating Parquet file in Drill from database.

      Summary-

      I am creating one Parquet file from database using CTAS.  But getting error while I am creating with PARTITION BY "External Sort encountered an error while spilling to disk" . The details of the error is given below.

      Version of Apache Drill -

      1.17

      Memory config-

      DRILL_HEAP=16 G
      DRILL_MAX_DIRECT_MEMORY=32G

      Few configs are mentioned here for information-

      store.parquet.reader.pagereader.async=true;

      store.parquet.reader.pagereader.bufferedread=false;

      planner.memory.max_query_memory_per_node=31147483648

      drill.exec.memory.operator.output_batch_size=4194304

      Details of volume-

      The number of rows for which I am trying to CTAS PARTITION BY  - 14424482. No of columns 145.

      There are 3 columns in Partition By clause.

      Python produced less than 1 GB Parquet from the same dataset in SNAPPY compression.

      I am able to create Parquet using CTAS without PARTITION BY.

      CTAS script-

      CREATE TABLE dfs.root.<Table_name>
      PARTITION BY (<Column1>,<Column2>,<Column3>)
      AS SELECT *
      FROM db.<Table>;

      Error Log-

      org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: External Sort encountered an error while spilling to disk

      org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
       at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
       at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
       at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
       at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: No space left on device

      Attachments

        Activity

          People

            Unassigned Unassigned
            bhabani.sreeparna@gmail.com Sreeparna Bhabani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: