Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
9.0.0
Description
There are a few issues with the documentation for the cloud storage examples where paths are incorrect. For example in this vignette: https://arrow.apache.org/docs/r/articles/fs.html
This doesn't work:
df <- read_parquet(bucket$path("nyc-taxi/year=2019/month=6/data.parquet"))
rather it should be:
df <- read_parquet(bucket$path("nyc-taxi/year=2019/month=6/part-0.parquet"))
which I think makes sense as part-0 is the default writing convention for write_dataset and therefore something users are likely to see. Indeed this the way the file structure was written:
library(arrow) bucket <- s3_bucket("voltrondata-labs-datasets") bucket$ls(path = "nyc-taxi/year=2011", recursive = TRUE) #> [1] "nyc-taxi/year=2011/month=1" #> [2] "nyc-taxi/year=2011/month=1/part-0.parquet" #> [3] "nyc-taxi/year=2011/month=10" #> [4] "nyc-taxi/year=2011/month=10/part-0.parquet" #> [5] "nyc-taxi/year=2011/month=11" #> [6] "nyc-taxi/year=2011/month=11/part-0.parquet" #> [7] "nyc-taxi/year=2011/month=12" #> [8] "nyc-taxi/year=2011/month=12/part-0.parquet" #> [9] "nyc-taxi/year=2011/month=2" #> [10] "nyc-taxi/year=2011/month=2/part-0.parquet" #> [11] "nyc-taxi/year=2011/month=3" #> [12] "nyc-taxi/year=2011/month=3/part-0.parquet" #> [13] "nyc-taxi/year=2011/month=4" #> [14] "nyc-taxi/year=2011/month=4/part-0.parquet" #> [15] "nyc-taxi/year=2011/month=5" #> [16] "nyc-taxi/year=2011/month=5/part-0.parquet" #> [17] "nyc-taxi/year=2011/month=6" #> [18] "nyc-taxi/year=2011/month=6/part-0.parquet" #> [19] "nyc-taxi/year=2011/month=7" #> [20] "nyc-taxi/year=2011/month=7/part-0.parquet" #> [21] "nyc-taxi/year=2011/month=8" #> [22] "nyc-taxi/year=2011/month=8/part-0.parquet" #> [23] "nyc-taxi/year=2011/month=9" #> [24] "nyc-taxi/year=2011/month=9/part-0.parquet"
Attachments
Issue Links
- links to