Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1948

Reading large parquet files via HDFS fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.7.0
    • 0.8.0
    • Storage - Parquet
    • None
    • Hadoop 2.4.0 on Amazon EMR

    Description

      There appears to be an issue with reading medium to large Parquet files via HDFS. We have created a basic Parquet file in with a schema like so:

      sellprice DOUBLE

      When filled with 10,000 double values, the following query in Drill works fine:

      select sum(sellprice) from hdfs.`/saleparquet`;

      When filled with 50,000 double values, the following error occurs:

      Query failed: Query stopped.[ 9aece851-48bc-4664-831e-d35bbfbcd1d5 on ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]

      java.lang.RuntimeException: java.sql.SQLException: Failure while executing query.

      The full stack trace is:

      2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
      java.lang.ArrayIndexOutOfBoundsException: null
      2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR o.a.d.e.p.i.ScreenCreator$ScreenRoot - Error 88fe95c3-b088-4674-8b65-967a7f4c3cdf: Query stopped.
      java.lang.ArrayIndexOutOfBoundsException: null
      2015-01-07 05:48:57,809 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error cd4123e4-7b9d-451d-90f0-3cc1ecf461e4: Failure while running fragment.
      java.lang.ArrayIndexOutOfBoundsException: null
      2015-01-07 05:48:57,813 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] ERROR o.a.drill.exec.work.foreman.Foreman - Error 5db2c65b-cd10-4970-ba2b-f29b51fda923: Query failed: Failure while running fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
      [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]

      org.apache.drill.exec.rpc.RemoteRpcException: Failure while running fragment.[ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
      [ cd4123e4-7b9d-451d-90f0-3cc1ecf461e4 on ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]

      at org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:93) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:151) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:113) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:109) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:166) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:116) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
      at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
      2015-01-07 05:48:57,814 [2b533736-1ef8-c038-7d3b-f718829e7b74:frag:0:0] WARN o.a.d.e.p.impl.SendingAccountor - Failure while waiting for send complete.
      java.lang.InterruptedException: null
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301) ~[na:1.7.0_71]
      at java.util.concurrent.Semaphore.acquire(Semaphore.java:472) ~[na:1.7.0_71]
      at org.apache.drill.exec.physical.impl.SendingAccountor.waitForSendComplete(SendingAccountor.java:44) ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.stop(ScreenCreator.java:186) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:144) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:117) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
      at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

      If I fill with even more values (e.g. 100,000 or 1,000,000) - I get a variety of other errors, such as:

      "Query failed: Query stopped., don't know what type: 14"

      coming from the Parquet engine.

      I am able to consistently replicate this in my environment with a basic Parquet file. I can attach that file if necessary.

      Attachments

        1. DRILL-1948.2.patch.txt
          1 kB
          Adam Gilmore
        2. DRILL-1948.1.patch.txt
          1 kB
          Adam Gilmore

        Activity

          People

            parthc Parth Chandra
            dragoncurve Adam Gilmore
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: