Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2372

Unnecessarily high mem usage during PHJ/PAGG::Partition:Spill

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
    • Fix Version/s: None
    • Component/s: None

      Description

      The code of Partition::Spill() for both PHJ and PAGG does not release memory as soon as it can increasing the changes not to be able to SwitchToIoBuffers the probe_rows_ or the unaggregated_rows_ correspondingly. That is, the code of PHJ::PartitionSpill() is (PAGG's is similar):

      Status PartitionedHashJoinNode::Partition::Spill(bool unpin_all_build) {
      <...>
        bool got_buffer = true;
        if (build_rows_->using_small_buffers()) {
          RETURN_IF_ERROR(build_rows_->SwitchToIoBuffers(&got_buffer));
        }
        if (got_buffer && probe_rows_->using_small_buffers()) {
          RETURN_IF_ERROR(probe_rows_->SwitchToIoBuffers(&got_buffer)); <== This may fail if build_rows_ has a very large number of buffers pinned.
        }
        if (!got_buffer) {
          Status status = Status::MemLimitExceeded();
      <...>
          return status;
        }
      
      <...>
        if (hash_tbl() != NULL) {
          hash_tbl()->Close(); <== All this time we have been keeping the HT in memory. No reason to do so.
          hash_tbl_.reset();
        }
      ...
        return build_rows()->UnpinStream(unpin_all_build); <== All this time we have been keeping these build_rows_ buffers pinned. No reason to do so.
      }
      
      

      It looks like we can further reduce the memory needed when spilling a partition by
      (a) Destruct the hash table
      (b) SwitchToIoBuffers() the build_rows_ or the aggregated_rows_
      (c) if successful or already switched, UnpinStream() for that stream so that we free up a potentially large number of buffers
      (d) SwitchToIoBuffers() the probe_rows_ or the unaggregated_rows_

      The reason we have not noticed this problem it was because up until now we would switch all streams of all partitions to IO-sized buffers the first time the small buffers of any stream would fill up. That is, the small buffers patch exposes this problem as well (with that patch a couple of test_spilling tests needed a lot of more memory to complete successfully).

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ippokratis Ippokratis Pandis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: