[ARROW-15123] [R] CSV dataset file header read in as data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 6.0.0, 6.0.1
Fix Version/s: 7.0.0
Component/s: R
Labels:
- pull-request-available
- schema

External issue URL:
https://github.com/apache/arrow/issues/30631
Language:
- R

Description

In `arrow` 6.0.0+ for R, when I read in a CSV file using a schema where the order of the columns in the schema doesn't match the order of columns in the CSV, the data is read in incorrectly.

The header is included as an observation in the read-in dataset. The columns are renamed but not reordered to match the schema. So I end up with the "quantile" column called "location", etc, as below.

[1] "last few obs in sorted order with arrow"
# A tibble: 6 × 7
  forecast_date target       target_end_date location type       quantile value 
  <chr>         <chr>        <chr>           <chr>    <chr>      <chr>    <chr> 
1 2021-12-12    9 day ahead… 2021-12-21      0.99     946.43313… 06       quant…
2 2021-12-12    9 day ahead… 2021-12-21      0.99     956.43294… 39       quant…
3 2021-12-12    9 day ahead… 2021-12-21      0.99     97.948144… 41       quant…
4 2021-12-12    9 day ahead… 2021-12-21      0.99     98.573545… 49       quant…
5 2021-12-12    9 day ahead… 2021-12-21      0.99     98.978636… 33       quant…
6 forecast_date target       target_end_date quantile value      location type

The last line ("forecast_date target...") is the original header.

The file in question (https://raw.githubusercontent.com/reichlab/covid19-forecast-hub/master/data-processed/JHUAPL-Gecko/2021-12-12-JHUAPL-Gecko.csv) has 45360 observations + 1 line for the header. But the read-in dataset has

[1] "dimensions with arrow"
[1] 45361     7

Reprex attached with working (`packageVersion("arrow") == 4.0.1`; 5.0.0 also works) and non-working (`packageVersion("arrow") == 6.0.1`) examples. Run examples using `make run-broken` and `make run-works`.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

reprex-arrow-6-read.tar.gz
15/Dec/21 22:42
1 kB
N D

Issue Links

links to

GitHub Pull Request #12152

Activity

People

Assignee:: Nicola Crane

Reporter:: N D

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 15/Dec/21 22:59

Updated:: 11/Jan/23 08:44

Resolved:: 26/Jan/22 14:10

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 40m