Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7736

Error while reading from Parquet : DATA_READ ERROR: Exception occurred while reading from disk

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 1.17.0
    • None
    • Functions - Drill
    • None

    Description

      Facing one issue while creating Parquet file in Drill from another Parquet file.

      Summary-

      I am re-writing one Parquet file from another Parquet file using CTAS PARTITION BY (). The source Parquet file is generated from Python. But when I am trying to rewrite the parquet I am getting error. The details of the error is given below.

      Version of Apache Drill -

      1.17

      Memory config-

      DRILL_HEAP=16 G
      DRILL_MAX_DIRECT_MEMORY=32G

      Few configs are mentioned here for information-

      exec.sort.disable_managed=true

      store.parquet.reader.pagereader.async=true;

      store.parquet.reader.pagereader.bufferedread=false;

      planner.memory.max_query_memory_per_node=31147483648

      drill.exec.memory.operator.output_batch_size=4194304

      Details of volume-

      The number of rows for which I am trying to CTAS is - 25245241. No of columns 145.

      FYI - I am able to create Parquet using CTAS for less number of rows.

      CTAS script-

      CREATE TABLE dfs.root.<Table_name>
      PARTITION BY (<Column1>,<Column2>,<Column3>)
      AS SELECT *
      FROM dfs.root.<source_parquet>;

      Error Log-

      2020-05-07 xx:xx:xx,504 [scan-4] INFO  o.a.d.e.s.p.c.AsyncPageReader - User Error Occurred: Exception occurred while reading from disk. (can not read class org.apache.parquet.format.PageHeader: java.io.InterruptedIOException: Interrupted while choosing DataNode for read.)
      org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Exception occurred while reading from disk.

      File:  <xxx>.parquet
      Column:  <xxx>
      Row Group Start:  25545832

      [Error Id: 4157803d-a37e-4693-bc1a-b654807222ed ]
       at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:637)
       at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:190)
       at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$700(AsyncPageReader.java:84)
       at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:480)
       at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:394)
       at org.apache.drill.exec.util.concurrent.ExecutorServiceUtil$CallableTaskWrapper.call(ExecutorServiceUtil.java:85)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: java.io.InterruptedIOException: Interrupted while choosing DataNode for read.
       at org.apache.parquet.format.Util.read(Util.java:232)
       at org.apache.parquet.format.Util.readPageHeader(Util.java:81)
       at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:437)
       ... 6 common frames omitted
      Caused by: shaded.parquet.org.apache.thrift.transport.TTransportException: java.io.InterruptedIOException: Interrupted while choosing DataNode for read.
       at shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
       at shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
       at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
       at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:539)
       at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:158)
       at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:973)
       at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:966)
       at org.apache.parquet.format.PageHeader.read(PageHeader.java:843)
       at org.apache.parquet.format.Util.read(Util.java:229)
       ... 8 common frames omitted
      Caused by: java.io.InterruptedIOException: Interrupted while choosing DataNode for read.
       at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:910)
       at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
       at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
       at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567)
       at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
       at java.io.DataInputStream.read(DataInputStream.java:149)
       at java.io.FilterInputStream.read(FilterInputStream.java:133)
       at shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
       ... 16 common frames omitted

      Attachments

        Activity

          People

            Unassigned Unassigned
            bhabani.sreeparna@gmail.com Sreeparna Bhabani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: