Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13580

[C++] quoted_strings_can_be_null only applied to string columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.0.0
    • C++

    Description

      My interpretation of the "string" in quoted_strings_can_be_null is that it is referring to the unparsed CSV input string and not the actual output data type.

      So when converting:

      Unable to find source-code formatter for language: csv. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      "one","two","three"
      "1","2","3"
      "4","","6"'
      

      We should get...
      [1, 4], [2, None], [3, 6]

      ...currently we get...
      [1, 4], ['2', None], [3, 6]

      In pandas the above string parses to...

      >>> f = io.BytesIO(b'"one","two","three"\n"1","2","3"\n"4","","6"')
      >>> pandas.read_csv(f)
         one  two  three
      0    1  2.0      3
      1    4  NaN      6
      

      So this is bringing us closer to pandas which is probably a good thing.

      Inspired by: https://github.com/apache/arrow/issues/10892

      Attachments

        Issue Links

          Activity

            People

              westonpace Weston Pace
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m