Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4539

Parquet scanner memory bug: I/O buffer is attached to output batch while scratch batch rows still reference it

    XMLWordPrintableJSON

Details

    Description

      in the HdfsScanner,RowBatch own some io_buffers when Scanner has complete some io read:

        // We need to pass the row batch to the scan node if there is too much memory attached,
        // which can happen if the query is very selective. We need to release memory even
        // if no rows passed predicates.
        if (batch_->AtCapacity() || context_->num_completed_io_buffers() > 0) {
          context_->ReleaseCompletedResources(batch_, /* done */ false);
        }
      

      when the row batch is reset, the io_buffers will be free or return to the mem_pool。

        if (!FLAGS_disable_mem_pools && free_buffers_[idx].size() < FLAGS_max_free_io_buffers) {
          free_buffers_[idx].push_back(buffer);
          if (ImpaladMetrics::IO_MGR_NUM_UNUSED_BUFFERS != NULL) {
            ImpaladMetrics::IO_MGR_NUM_UNUSED_BUFFERS->Increment(1L);
          }
        } else {
          process_mem_tracker_->Release(buffer_size);
          num_allocated_buffers_.Add(-1);
          delete[] buffer;
          if (ImpaladMetrics::IO_MGR_NUM_BUFFERS != NULL) {
            ImpaladMetrics::IO_MGR_NUM_BUFFERS->Increment(-1L);
          }
          if (ImpaladMetrics::IO_MGR_TOTAL_BYTES != NULL) {
            ImpaladMetrics::IO_MGR_TOTAL_BYTES->Increment(-buffer_size);
          }
        }
      

      here is the bug:the io_buffers owned by the row batch A may by used by the row batch B in next ScanNode::GetNext at the same time,but when the row batch B is need to be read,the io_buffers may has been released because row batch A has been reset。for example:

      1. in scanner, get row batch A and owned the io_buffer O1。
      2. row batch A has been consumed,and the io_buffer O1 is released。
      3. in scanner, get row batch B,but some tuple in row batch B is pointed to io_buffer O1,for example,some string tuples。Especially when row batch A is AtCapacity(),the io_buffer is very likely not only used by row batch A。
      4. when row batch B need to be consumed,some tuples will produce error data,because io_buffer O1 has been released。
      

      this bug is easy to reproduce when use starts option: "disable_mem_pools=true",because in this situation, the io_buffers will be free really instead of being returned to the mem_pool。

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              cfreely Fu Lili
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: