Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13333

[C++] [Dataset] Support max file size option in write dataset

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • C++
    • None

    Description

      The existence FileSystemDatasetWriteOptions::basename_template would seem to imply that the dataset writer may write multiple files for a given partition.  However, the current implementation will always create one file per directory.

       

      I'm not sure what the desired behavior is here but the two obvious choices are:

       * Get rid of FileSystemDatasetWriteOptions::basename_template (or at least the {i} parameter)

       * Add an option to limit how many rows/bytes are put in a single file

       

      ARROW-12358 is probably worth mentioning as whatever strategy is come up with here should probably be compatible with supporting append mode in the future.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: