Thank you so much for adding support for reading and writing parquet files in R! I have a few questions about the user interface and optional arguments, but I want to highlight how great it is to have this useful filetype to pass data back and forth.
The defaults for the optional arguments in arrow::write_parquet aren't always clear. Here were my questions after reading the help docs from write_parquet:
- What's the default version? Should a user prefer "2.0" for new projects?
- What are acceptable values for compression? (Answer: uncompressed, snappy, gzip, brotli, zstd, or lz4.)
- What's the default for use_dictionary? Seems to be TRUE, at least some of the time.
- What's the default for write_statistics? Should a user prefer TRUE?
- Can I assume allow_truncated_timestamps is FALSE by default?
As someone who works in both R and Python, I was a little surprised when pyarrow uses snappy compression by default, but R's default is uncompressed. My preference would be having the same default arguments, but that might be a fringe use-case.
While I was digging into this, I was surprised that ParquetReaderProperties is exported and documented, but ParquetWriterProperties isn't. Is that intentional?