Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11582

[R] write_dataset "format" argument default and validation could be better

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 4.0.0
    • R

    Description

       

      I'd like to use the R package interface to access data distributed in a tab-separated text file that is much larger than available RAM.  I understand that in principle this is possible using `open_datatset()` in text mode and then streaming data out to parquet via `write_dataset()`, but this strategy fails even on small text files with an unexpected error:

      Here's a minimal reproducible example.

       
      fs::dir_create("import_dir")
      readr::write_tsv(mtcars, "import_dir/mtcars.tsv")
      ds <- arrow::open_dataset("import_dir", format="text", delim="\t")
      arrow::write_dataset(ds, "parquet_dir")
      The error I get occurs only on the last line (`write_dataset()`), saying:
      Error in options$update(...) : attempt to apply non-function
       

      Attachments

        Issue Links

          Activity

            People

              pachamaltese Mauricio 'Pachá' Vargas Sepúlveda
              cboettig Carl Boettiger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m