Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2372

Unnecessarily high mem usage during PHJ/PAGG::Partition:Spill

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
    • None
    • None

    Description

      The code of Partition::Spill() for both PHJ and PAGG does not release memory as soon as it can increasing the changes not to be able to SwitchToIoBuffers the probe_rows_ or the unaggregated_rows_ correspondingly. That is, the code of PHJ::PartitionSpill() is (PAGG's is similar):

      Status PartitionedHashJoinNode::Partition::Spill(bool unpin_all_build) {
      <...>
        bool got_buffer = true;
        if (build_rows_->using_small_buffers()) {
          RETURN_IF_ERROR(build_rows_->SwitchToIoBuffers(&got_buffer));
        }
        if (got_buffer && probe_rows_->using_small_buffers()) {
          RETURN_IF_ERROR(probe_rows_->SwitchToIoBuffers(&got_buffer)); <== This may fail if build_rows_ has a very large number of buffers pinned.
        }
        if (!got_buffer) {
          Status status = Status::MemLimitExceeded();
      <...>
          return status;
        }
      
      <...>
        if (hash_tbl() != NULL) {
          hash_tbl()->Close(); <== All this time we have been keeping the HT in memory. No reason to do so.
          hash_tbl_.reset();
        }
      ...
        return build_rows()->UnpinStream(unpin_all_build); <== All this time we have been keeping these build_rows_ buffers pinned. No reason to do so.
      }
      
      

      It looks like we can further reduce the memory needed when spilling a partition by
      (a) Destruct the hash table
      (b) SwitchToIoBuffers() the build_rows_ or the aggregated_rows_
      (c) if successful or already switched, UnpinStream() for that stream so that we free up a potentially large number of buffers
      (d) SwitchToIoBuffers() the probe_rows_ or the unaggregated_rows_

      The reason we have not noticed this problem it was because up until now we would switch all streams of all partitions to IO-sized buffers the first time the small buffers of any stream would fill up. That is, the small buffers patch exposes this problem as well (with that patch a couple of test_spilling tests needed a lot of more memory to complete successfully).

      Attachments

        Activity

          People

            Unassigned Unassigned
            ippokratis Ippokratis Pandis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: