Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4284

Complex Data Causing Index Out of Bounds with UNION Type

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.0
    • Fix Version/s: None
    • Component/s: Storage - JSON
    • Labels:
      None

      Description

      Working with complex json data and the UNION type, has shown a Index out of Bounds error when trying to read data. Here are the posts from the Drill User Group:

      After getting some pointers on the new experimental Union type with json, I started getting a different error related to index out of bounds, I thought I'd post here to determine what it could be, and if a bug, I can then open a JIRA.

      So first, I did:

      ALTER SESSION SET `exec.errors.verbose` = true; – So I could get full errors
      ALTER SESSION SET `exec.enable_union_type` = true; – So I could use the experimental UNION type

      Now, my first query, select * from `/data/prod/src/` gave me the errors below. The files change, and ironically, if I select directly from any specific file (even the ones in the error) often times the query works fine. It's going through a directory of files that cause the error. Sometimes I Can do multiple files, but often times, but I come to one file, and it seems to break it. The file that breaks things doesn't look different from others, but at the same time, I can select directly from the file, and it works... weird. Let know if I can do anything to help troubleshoot more.

      Data Notes (see example below):

      • The ... represents LOTs of other fields, some simple, some complex/nested. This data is NOT Pretty.
      • The files are goofy in that each file has one top level field of "count" then a huge array of events
      • The field that is ALWAYS (as far as I've seen) is the "features" field
      • This field will sometimes be an array and sometimes be an empty object. {}.
      • The size of the array for the features field (when not an empty object) does change from event to event. (My hunch is an issue there)
      • This occurs even if I don't reference the features field, say I am trying to flatten a different field at the same level as features.
        Error:

      Error: DATA_READ ERROR: index: 0, length: 4 (expected: range(0, 0))

      File /data/prod/src/file1.json
      Record 1
      Line 193
      Column 34
      Field feature
      Fragment 0:0

      [Error Id: 25a2c963-86db-40e9-b5cc-2674887de2fe on node7:31010]

      (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 0))
      io.netty.buffer.DrillBuf.checkIndexD():175
      io.netty.buffer.DrillBuf.chk():197
      io.netty.buffer.DrillBuf.getInt():477
      org.apache.drill.exec.vector.UInt4Vector$Accessor.get():356
      org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():305
      org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
      org.apache.drill.exec.vector.complex.impl.AbstractPromotableFieldWriter.startList():126
      org.apache.drill.exec.vector.complex.impl.PromotableWriter.startList():42
      org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():461
      org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():305
      org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():470
      org.apache.drill.exec.vector.complex.fn.JsonReader.writeData():305
      org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():240
      org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():178
      org.apache.drill.exec.vector.complex.fn.JsonReader.write():144
      org.apache.drill.exec.store.easy.json.JSONRecordReader.next():191
      org.apache.drill.exec.physical.impl.ScanBatch.next():191
      org.apache.drill.exec.record.AbstractRecordBatch.next():119
      org.apache.drill.exec.record.AbstractRecordBatch.next():109
      org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
      org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
      org.apache.drill.exec.record.AbstractRecordBatch.next():162
      org.apache.drill.exec.physical.impl.BaseRootExec.next():104
      org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
      org.apache.drill.exec.physical.impl.BaseRootExec.next():94
      org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
      org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
      java.security.AccessController.doPrivileged():-2
      javax.security.auth.Subject.doAs():422
      org.apache.hadoop.security.UserGroupInformation.doAs():1595
      org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
      org.apache.drill.common.SelfCleaningRunnable.run():38
      java.util.concurrent.ThreadPoolExecutor.runWorker():1142
      java.util.concurrent.ThreadPoolExecutor$Worker.run():617
      java.lang.Thread.run():745 (state=,code=0)

      Example Data:

      {
      "count": 241,
      "events": [
      {
      ...
      ...
      ...
      "features": [

      { "count": 3, "name": "feature1" }

      ,

      { "count": 30, "name": "feature2" }

      ,

      { "count": 2, "name": "feature3" }

      ,

      { "count": 3, "name": "feature4" }

      ],
      ...
      ...
      },
      {
      ...
      ...
      ...
      "features": {},
      ...
      },
      {
      ...
      ...
      ...
      "features": [

      { "count": 3, "name": "feature1" }

      ,

      { "count": 30, "name": "feature2" }

      ,

      { "count": 2, "name": "feature3" }

      ],
      ...
      ...
      }
      ]
      }

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mandoskippy John Omernik
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: