Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11012

[Rust] [DataFusion] Make write_csv and write_parquet concurrent

    XMLWordPrintableJSON

Details

    Description

      ExecutionContext.write_csv and write_parquet currently iterate over the output partitions and execute one at a time and write the results out. We should run these as tokio tasks so they can run concurrently. This should, in theory, help with memory usage when the plan contains repartition operators.

      We may want to add a configuration option so we can choose between serial and parallel writes?

      Attachments

        Issue Links

          Activity

            People

              andygrove Andy Grove
              andygrove Andy Grove
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m