Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4001

Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Execution - Data Types
    • None

    Description

      In certain cases, MapVector.load(...) (called by RecordBatchLoader.load(...)) returns with some map child vectors having a length of zero instead of having a length matching the length of sibling vectors and the number of records in the batch. This causes MapVector.getObject(int) to fail, saying "java.lang.IndexOutOfBoundsException: index: 0, length: 1 (expected: range(0, 0))" (one of the errors seen in fixing DRILL-2288).

      The condition seems to be that a child field (e.g., an HBase column in a HBase column family) appears in an earlier batch and does not appear in a later batch.

      (The HBase column's child vector gets created (in the MapVector for the HBase column family) during loading of the earlier batch. During loading of the later batch, all vectors get reset to zero length, and then only vectors for fields appearing in the batch message being loaded get loaded and set to the length of the batch--other vectors created from earlier messages/load calls are left with a length of zero (instead of, say, being filled with nulls to the length of their siblings and the current record batch).)

      See the TODO(DRILL-4001) mark and workaround in MapVector.getObject(int).

      Attachments

        Issue Links

          Activity

            People

              paul-rogers Paul Rogers
              dsbos Daniel Barclay
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: