Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4138

RowBatch::AcquireState() cannot be used if the initial capacity of a RowBatch is not known

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:
      None

      Description

      If you want to acquire the state of a row batch, the usual pattern is something like:

      RowBatch dest(src->row_desc(), batch->capacity(), mem_tracker);
      dest.AcquireState(src);
      

      AcquireState() checks to see if the allocated space for tuple ptrs is the same in both batches - this is calculated by capacity * num_tuples_per_row * sizeof(Tuple*), and is fixed in the constructor.

      But capacity can change after construction, by calling MarkCapacity(). If this happens, dest will allocate too little memory for the tuple ptrs, and AcquireState() will hit a DCHECK.

      The solution seems to be recording the maximum capacity in the c'tor, and using that when constructing dest.

        Activity

        Hide
        tarmstrong Tim Armstrong added a comment -

        The bug is in the calling code - AcquireState() doesn't work if the two row batches have different sized tuple_ptrs_, since it needs to swap tuple_ptrs_ back into the source batch. If you want to do this you should allocate 'dest' with capacity RuntimeState::batch_size()

        Show
        tarmstrong Tim Armstrong added a comment - The bug is in the calling code - AcquireState() doesn't work if the two row batches have different sized tuple_ptrs_, since it needs to swap tuple_ptrs_ back into the source batch. If you want to do this you should allocate 'dest' with capacity RuntimeState::batch_size()
        Hide
        henryr Henry Robinson added a comment -

        That seems a bit brittle - that means you can't move one RowBatch into another without a RuntimeState being around, and assumes that all batches are going to be fixed capacity (that may well be true).

        The patch to make this easier is ~2 lines, and caused me to write a short test, so I'll go ahead and submit it anyhow.

        Show
        henryr Henry Robinson added a comment - That seems a bit brittle - that means you can't move one RowBatch into another without a RuntimeState being around, and assumes that all batches are going to be fixed capacity (that may well be true). The patch to make this easier is ~2 lines, and caused me to write a short test, so I'll go ahead and submit it anyhow.
        Hide
        tarmstrong Tim Armstrong added a comment -

        I think I misunderstood the initial report (since that exact code won't work even after you add the initial_capacity() method), we seem to be on the same page

        Show
        tarmstrong Tim Armstrong added a comment - I think I misunderstood the initial report (since that exact code won't work even after you add the initial_capacity() method), we seem to be on the same page
        Hide
        alex.behm Alexander Behm added a comment -

        Sorry, accidentally selected the wrong JIRA.

        Show
        alex.behm Alexander Behm added a comment - Sorry, accidentally selected the wrong JIRA.
        Show
        henryr Henry Robinson added a comment - https://github.com/apache/incubator-impala/commit/cd6d86b83078290c62c2d92b071a15ec19bbeb4f

          People

          • Assignee:
            henryr Henry Robinson
            Reporter:
            henryr Henry Robinson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development