Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5490

RepeatedVarCharOutput.recordStart becomes corrupted on realloc

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.10.0
    • None
    • None
    • None

    Description

      The class RepeatedVarCharOutput, used to read text data into the columns array, has a member called recordStart which is supposed to track the memory offset of the start of the record data in the columns VarChar array. However, the logic for the member is very wrong. First, it is initialized based on an uninitialized variable:

      code
      public void startBatch()

      { this.recordStart = characterDataOriginal; ... loadVarCharDataAddress(); }

      private void loadVarCharDataAddress()

      { ... this.characterDataOriginal = buf.memoryAddress(); ... }

      code

      Second, the class keeps track of actual memory addresses for the VarChar data. When data would exceed the buffer, memory is reallocated using the following method. But, this method does not adjust the recordStart member, which now points to a bogus memory address:

        private void expandVarCharData(){
          vector.getDataVector().reAlloc();
          long diff = characterData - characterDataOriginal;
          loadVarCharDataAddress();
          characterData += diff;
        }
      

      Fortunately, like so many bugs in this class, these bugs are offset by the fact that the variable is never used in a way that cause obvious problems (else these issues would have been found sooner.) The only real use is:

        @Override
        public boolean rowHasData() {
          return recordStart < characterData;
        }
      

      Which, it turns out, is used only at EOF in TextReader.parseRecord():

          } catch(StreamFinishedPseudoException e) {
            // if we've written part of a field or all of a field, we should send this row.
            if (fieldsWritten == 0 && !output.rowHasData()) {
              throw e;
            }
          }
      

      Thus, the bug will cause a failure only if:

      • A batch is resized, and
      • The new address is lower than the original address, and
      • It is the last batch in the file.

      Because of the low probability of hitting the bug, the priority is set to Minor.

      Attachments

        Activity

          People

            Unassigned Unassigned
            paul-rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: