Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12097

[C++] Modify BackgroundGenerator so it creates fewer threads

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • C++

    Description

      The current implementation creates a thread per block and in the CSV reader this hurts performance just a bit.  However, in the IPC reader this hurts performance even more.

      Instead the readahead can move inside the background generator and the background generator task can keep running until the queue fills up and then restart when the queue has drained enough for a substantial amount of work to be done.

      In my test CSV case this dropped the # of thread tasks created from ~2.5k to ~100.

      Attachments

        Issue Links

          Activity

            People

              westonpace Weston Pace
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m

                  Slack

                    Issue deployment