Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15517

[R] Use WriteNode in write_dataset()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 8.0.0
    • R

    Description

      Currently, write_dataset uses the Scanner interface, which can't handle everything that the ExecPlan does. So if your arrow_dplyr_query contains things like aggregations or (more importantly) joins, you have to materialize the Table in memory before you can write to disk. The WriteNode added in ARROW-13542 is a special sink node that can be put at the end of an ExecPlan, so data should be able to stream to disk in more cases, and will benefit from future improvements to ExecPlan memory usage and spillover.

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              npr Neal Richardson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h
                  4h