Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12560

[C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

    XMLWordPrintableJSON

Details

    Description

      Imagine there is a slow map function (that could run in parallel) and a vector generator given a long vector of tasks.  If we apply map to the generator and then readahead we won't actually get any parallelism because the vector generator returns everything synchronously and so no thread task will ever be submitted.

      This hypothetical situation is a reality in some situations in the scanner.  For example, if scanning CSV files and the CPU threads fall behind the I/O threads then all callbacks will be synchronous (since the futures will already have been completed by the I/O threads).

      In such a situation we might benefit from creating a new thread task even though we wouldn't normally create one.  For example, if we have an idle core.  You can think of this as an analogue of work stealing.

      On the other hand, creating new thread tasks at any random callback might not be the most efficient. We could mitigate this by marking a callback as "potentially long" as some kind of hint when we add the callback to indicate it as eligible for eager thread creation.

      Attachments

        Issue Links

          Activity

            People

              westonpace Weston Pace
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5.5h
                  5.5h