Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
If partitioned data is read back in and a schema is used (containing the partitioning variable), there is an error - see below. The error occurs whether or not the argument partitioning is specified or not. I think this is happening at the C++ level not the R level, though I'm a little unsure.
library(arrow) library(dplyr) data(diamonds, package='ggplot2') write_dataset(diamonds, path='diamonds', format='csv', partitioning='cut') diamond_schema <- schema( carat=float64(), cut=string(), color=string(), clarity=string(), depth=float64(), table=float64(), price=float64(), x=float64(), y=float64(), z=float64(), ) open_dataset('diamonds', format='csv', schema=diamond_schema, partitioning = "cut") %>% collect() # Error: Invalid: Could not open CSV input source '/home/nic2/arrow/r/diamonds/cut=Fair/part-0.csv': Invalid: CSV parse error: Row #1: Expected 10 columns, got 9: "carat","color","clarity","depth","table","price","x","y","z"
Attachments
Issue Links
- is fixed by
-
ARROW-10485 [R] Accept partitioning in open_dataset when file paths are hive-style
- Resolved