Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11243

[C++] Parse time32 from string and infer in CSV reader

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0
    • 6.0.0
    • C++
    • Ubuntu 18.04, R 4.0.3

    Description

      When reading a CSV with read_csv_arrow() with date types and time types, the dates are read as datetimes rather than dates and times are read as characters rather than time.

      The first problem can be fixed by supplying date32() to schema(), though better inference would be nice. However, supplying time32() to schema() causes an error.

      Here is a sample dataset, also attached.

      date,time,reading
      2021-01-01,00:00:00,67.8
      2021-01-01,00:00:00,72.4
      2021-01-01,00:00:00,63.1
      2021-01-01,00:05:00,67.8

      Reading with readr::read_csv() results in a tibble with three columns: date, time, dbl, as expected.
       

      samp_readr <- readr::read_csv('sampledata.csv')
      samp_readr
      
      # A tibble: 4 x 3
        date       time   reading
        <date>     <time>   <dbl>
      1 2021-01-01 00'00"    67.8
      2 2021-01-01 00'00"    72.4
      3 2021-01-01 00'00"    63.1
      4 2021-01-01 05'00"    67.8
      

      Reading with arrow::read_csv_arrow() without providing schema() results in a tibble with three columns: dttm, chr, dbl.

      samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
      samp_arrow_plain
      
      # A tibble: 4 x 3
        date                time     reading
        <dttm>              <chr>      <dbl>
      1 2020-12-31 19:00:00 00:00:00    67.8
      2 2020-12-31 19:00:00 00:00:00    72.4
      3 2020-12-31 19:00:00 00:00:00    63.1
      4 2020-12-31 19:00:00 00:05:00    67.8
      

      Reading with arrow::read_csv_arrow() and providing date=date32() via schema() to col_types results in a tibble with three columns: date, chr, dbl.

      samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(date=date32()))
      samp_arrow_date
      
      # A tibble: 4 x 3
        date       time     reading
        <date>     <chr>      <dbl>
      1 2021-01-01 00:00:00    67.8
      2 2021-01-01 00:00:00    72.4
      3 2021-01-01 00:00:00    63.1
      4 2021-01-01 00:05:00    67.8
      

      Reading with arrow::read_csv_arrow() and providing time=time32() via schema() to col_types generates an error.

      samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(time=time32()))
      
      Error in csv___TableReader__Read(self) : 
        NotImplemented: CSV conversion to time32[ms] is not supported
      

      The same error occurs when using compact string notation.

      samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', col_names=c('date', 'time', 'reading'), skip=1)
      
      Error in csv___TableReader__Read(self) : 
        NotImplemented: CSV conversion to time32[ms] is not supported
      

      This is something in the internals, so far beyond me to figure out a fix, but I saw it in action and wanted to report it.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            apitrou Antoine Pitrou
            jaredlander Jared Lander
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 10m
                3h 10m

                Slack

                  Issue deployment