Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15409

[C++] The C++ API for writing datasets could be improved

    XMLWordPrintableJSON

Details

    Description

      I was working on write dataset testing in the C++ API today and ran into a number of things that were not very intuitive. All of these are abstracted away / hidden by the python / R interface so this really only applies to anyone using the C++ API directly.

      • If no partitioning is specified the write will segfault. Instead it should us a default (no-op) partitioning.
      • The min_rows_per_group option should probably default to something higher than 0
      • It's not clear how to specify the format (you do it by creating a format, then setting the file write options, which sets the format privately)
      • There is no default for basename_template
      • There is no default for filesystem (should be local filesystem)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m