Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7737

Error while creating Parquet from database : External Sort encountered an error while spilling to disk

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.17.0
    • Fix Version/s: None
    • Component/s: Functions - Drill
    • Labels:
      None

      Description

      Facing one issue while creating Parquet file in Drill from database.

      Summary-

      I am creating one Parquet file from database using CTAS.  But getting error while I am creating with PARTITION BY "External Sort encountered an error while spilling to disk" . The details of the error is given below.

      Version of Apache Drill -

      1.17

      Memory config-

      DRILL_HEAP=16 G
      DRILL_MAX_DIRECT_MEMORY=32G

      Few configs are mentioned here for information-

      store.parquet.reader.pagereader.async=true;

      store.parquet.reader.pagereader.bufferedread=false;

      planner.memory.max_query_memory_per_node=31147483648

      drill.exec.memory.operator.output_batch_size=4194304

      Details of volume-

      The number of rows for which I am trying to CTAS PARTITION BY  - 14424482. No of columns 145.

      There are 3 columns in Partition By clause.

      Python produced less than 1 GB Parquet from the same dataset in SNAPPY compression.

      I am able to create Parquet using CTAS without PARTITION BY.

      CTAS script-

      CREATE TABLE dfs.root.<Table_name>
      PARTITION BY (<Column1>,<Column2>,<Column3>)
      AS SELECT *
      FROM db.<Table>;

      Error Log-

      org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: External Sort encountered an error while spilling to disk

      org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
       at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
       at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
       at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
       at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: No space left on device

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bhabani.sreeparna@gmail.com Sreeparna Bhabani

              Dates

              • Created:
                Updated:

                Issue deployment