Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12315

[R] add max_partitions argument to write_dataset()

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0
    • 7.0.0
    • R

    Description

      the Python docs show that we can pass, say, 1025 partitions
      https://arrow.apache.org/docs/_modules/pyarrow/dataset.html

      but in R this argument doesn't exist, it would be good to add this for arrow v4.0.0

      this is useful, for example, with intl trade datasets:

      # d = UN COMTRADE - World's bilateral flows 2019
      # 13,050,535 x 22 data.frame
      d %>%
                group_by(Year, `Reporter ISO`, `Partner ISO`) %>%
                write_dataset("parquet", hive_style = F)
      
      Error: Invalid: Fragment would be written into 12808 partitions. This exceeds the maximum of 1024
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pachamaltese Mauricio 'PachĂĄ' Vargas SepĂșlveda
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h

                  Slack

                    Issue deployment