Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6537

[R] Pass column_types to CSV reader

    XMLWordPrintableJSON

Details

    Description

      See also ARROW-6536. It may be the case that the csv reader does accept a Schema now, I think I saw that, but otherwise it takes unordered_map.

      read_csv_arrow should take for col_types either a Schema, a named list of Types, or the "compact string representation" that readr supports. Per its docs, "c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or _/- to skip the column." So, c = utf8(), i = int32(), d = float64(), l = bool(), f = dictionary(int32(), utf8()), D = date32(), T = timestamp(), t = time32(), etc. I'm not sure if ? and - are supported, and/or what exactly happens if you don't specify types for all columns, but I guess we'll find out, and we can make JIRAs if important features are missing.

      Following the existing conventions in csv.R, that compact string representation would be encapsulated in read_csv_arrow, so CsvTableReader and the various Csv*Options would only deal with the Arrow C++ interface.

      Attachments

        Issue Links

          Activity

            People

              romainfrancois Romain Francois
              npr Neal Richardson
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m