Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17033

[C++] Add GCS connection pool size option

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 8.0.0
    • None
    • C++

    Description

      Multi-threaded read performance in Arrow's GCS file system implementation currently is relatively low. Given the high latency of cloud blob systems like GCS, a common strategy is to use many concurrent readers (if the system has enough memory to support that), e.g. using 100 threads.

      The GCS client library offers a ConnectionPoolSize option. If this option is set to a value that's too low, concurrency is throttled. At the moment, this is not exposed in GcsOptions, consequently limiting multi-threaded throughput.

      Instead of exposing this option, an alternative implementation strategy could be to use the same value as set by arrow::io::SetIOThreadPoolCapacity.

      Attachments

        Activity

          People

            Unassigned Unassigned
            lgruen Leonhard Gruenschloss
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: