Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15079

[C++] Add scheduler to constrain memory of exec plans

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      This is a high-level JIRA that will probably consist of a number of subtasks. Currently the exec plan supports backpressure. This helps to limit the amount of data read in by a single query. Spillover can also help reduce the amount of memory used by a single query.

      However, if multiple queries are run concurrently, then the system will eventually run out of memory. I'd like to propose a simple scheduler to start to solve this problem.

      • Initially, the scheduler will be given a configurable max memory target. We will assume the user is responsible for ensuring this memory is not used elsewhere on the system. For example, a user may configure 12GB of RAM for the scheduler and the user is responsible for ensuring that 12GB of RAM is not consumed by other processes. In future JIRA issues we can add more sophistication such as detecting and adapting to the system memory levels.
      • The scheduler will track memory usage by querying for the RSS usage of the process. An alternative approach would be to use a dedicated memory pool. This is more flexible (allows for multiple schedulers / query engines in a single process) but wouldn't capture the non-pool allocations (although these should be small).
      • The ExecPlan will be modified to submit its tasks to the scheduler instead of directly to the executor (the scheduler will submit tasks to the executor). Tasks will be prioritized. If a task is higher priority then lower priority tasks will be pushed to disk. Prioritization will be based initially on a tasks position in the ExecPlan. Tasks closer to the sink will be higher priority.
      • The scheduler is separate from spillover. Although it may interact with spillover mechanisms to ask for more spillover as memory pressure increases.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h
                  5h