Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6537

[R] Pass column_types to CSV reader

    XMLWordPrintableJSON

    Details

      Description

      See also ARROW-6536. It may be the case that the csv reader does accept a Schema now, I think I saw that, but otherwise it takes unordered_map.

      read_csv_arrow should take for col_types either a Schema, a named list of Types, or the "compact string representation" that readr supports. Per its docs, "c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or _/- to skip the column." So, c = utf8(), i = int32(), d = float64(), l = bool(), f = dictionary(int32(), utf8()), D = date32(), T = timestamp(), t = time32(), etc. I'm not sure if ? and - are supported, and/or what exactly happens if you don't specify types for all columns, but I guess we'll find out, and we can make JIRAs if important features are missing.

      Following the existing conventions in csv.R, that compact string representation would be encapsulated in read_csv_arrow, so CsvTableReader and the various Csv*Options would only deal with the Arrow C++ interface.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                romainfrancois Romain Francois
                Reporter:
                npr Neal Richardson
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m