Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6960

[R] Add support for more compression codecs in Windows build

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.15.0
    • 0.16.0
    • R
    • Windows 10

    Description

      When I attempt to write a parquet file using lz4, zstd, or brotli compression using R arrow 0.15.0, I am unable to do so due to the codec support not being built (example below).

       

      > arrow::write_parquet(payout_strategy, sink = "records_test_lz4.parquet",compression = "lz4")
      Error in parquet___arrow___FileWriter__WriteTable(self, table, chunk_size) : 
       Arrow error: IOError: Arrow error: NotImplemented: LZ4 codec support not built

       

      I believe that the error is generated through https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression.cc#L124-L145, but I am not sure how to call 

      install.packages("arrow")

      in R to enable the ARROW_WITH_ZSTD/LZ4/BROTLI flags, or whether I should be doing installing zstd separately from arrow and then doing something pre- or post-install to link zstd with arrow. From https://github.com/apache/arrow/issues/1209, it appears that zstd support has been added to arrow and parquet in general, and the R package readme (https://github.com/apache/arrow/tree/master/r) notes "On macOS and Windows, installing a binary package from CRAN will handle Arrow's C++ dependencies for you", but I get the sense that does not apply to zstd.

       

      Is there guidance as to how to enable zstd and other compression codecs prior to or after downloading the R arrow package? Could this be added to the R documentation somewhere for future reference?

      Attachments

        Issue Links

          Activity

            People

              gngu Grant Nguyen
              gngu Grant Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m