Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7035

[R] Default arguments are unclear in write_parquet docs

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.16.0
    • Component/s: R
    • Environment:
      Ubuntu with libparquet-dev 0.15.0-1, R 3.6.1, and arrow 0.15.0.

      Description

      Thank you so much for adding support for reading and writing parquet files in R! I have a few questions about the user interface and optional arguments, but I want to highlight how great it is to have this useful filetype to pass data back and forth.

      The defaults for the optional arguments in arrow::write_parquet aren't always clear. Here were my questions after reading the help docs from write_parquet:

      • What's the default version? Should a user prefer "2.0" for new projects?
      • What are acceptable values for compression? (Answer: uncompressed, snappy, gzip, brotli, zstd, or lz4.)
      • What's the default for use_dictionary? Seems to be TRUE, at least some of the time.
      • What's the default for write_statistics? Should a user prefer TRUE?
      • Can I assume allow_truncated_timestamps is FALSE by default?

      As someone who works in both R and Python, I was a little surprised when pyarrow uses snappy compression by default, but R's default is uncompressed. My preference would be having the same default arguments, but that might be a fringe use-case.

      While I was digging into this, I was surprised that ParquetReaderProperties is exported and documented, but ParquetWriterProperties isn't. Is that intentional?

      Thanks!

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                karldw Karl Dunkle Werner
                Reporter:
                karldw Karl Dunkle Werner
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m