Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4317

Exceptions on SELECT and CTAS with large CSV files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0, 1.5.0, 1.6.0
    • Fix Version/s: 1.7.0
    • Component/s: Storage - Text & CSV
    • Labels:
      None
    • Environment:

      4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu

      Description

      Selecting from a CSV file or running a CTAS into Parquet generates exceptions.

      Source file is ~650MB, a table of 4 key columns followed by 39 numeric data columns, otherwise a fairly simple format. Example:

      2015-10-17 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      2015-10-17 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
      2015-10-17 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
      2015-10-17 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
      

      A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually results in:

      java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: range(0, 547681))
              at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
              at io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
              at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
              at io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
              at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
              at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
              at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
              at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
              at org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
              at org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
              at org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
              at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
              at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
              at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
              at org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
              at org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
              at net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
              at org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
              at sqlline.Rows$Row.<init>(Rows.java:157)
              at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
              at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
              at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
              at sqlline.SqlLine.print(SqlLine.java:1593)
              at sqlline.Commands.execute(Commands.java:852)
              at sqlline.Commands.sql(Commands.java:751)
              at sqlline.SqlLine.dispatch(SqlLine.java:746)
              at sqlline.SqlLine.begin(SqlLine.java:621)
              at sqlline.SqlLine.start(SqlLine.java:375)
              at sqlline.SqlLine.main(SqlLine.java:268)
      

      A CTAS on the same file with storage as Parquet results in:

      Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)
      
      Fragment 1:2
      
      [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
      
        (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
          io.netty.buffer.AbstractByteBuf.checkIndex():1131
          io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
          io.netty.buffer.WrappedByteBuf.nioBuffer():727
          io.netty.buffer.UnsafeDirectLittleEndian.nioBuffer():26
          io.netty.buffer.DrillBuf.nioBuffer():356
          org.apache.drill.exec.store.ParquetOutputRecordWriter$VarCharParquetConverter.writeField():1842
          org.apache.drill.exec.store.EventBasedRecordWriter.write():62
          org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
          org.apache.drill.exec.record.AbstractRecordBatch.next():162
          org.apache.drill.exec.physical.impl.BaseRootExec.next():104
          org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
          org.apache.drill.exec.physical.impl.BaseRootExec.next():94
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
          org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
          java.security.AccessController.doPrivileged():-2
          javax.security.auth.Subject.doAs():415
          org.apache.hadoop.security.UserGroupInformation.doAs():1657
          org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
          org.apache.drill.common.SelfCleaningRunnable.run():38
          java.util.concurrent.ThreadPoolExecutor.runWorker():1145
          java.util.concurrent.ThreadPoolExecutor$Worker.run():615
          java.lang.Thread.run():745 (state=,code=0)
      

        Attachments

          Activity

            People

            • Assignee:
              adeneche Deneche A. Hakim
              Reporter:
              mattk Matt Keranen
              Reviewer:
              Krystal
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: