Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5195

[Python] read_csv ignores null_values on string types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.13.0
    • 0.14.0
    • C++, Python
    • Python 3.6, PyArrow 0.13.0, AWS linux, debian-slim in docker

    Description

      Let's write a simple CSV with NULL values in a string column:

      with open('foo.csv', 'w') as fobj:
          fobj.write('col1,col2\n1,value\n2,NULL')
      table = csv.read_csv('foo.csv')
      table.column('col2').null_count # => 0

       
      table.column('col2').null_count will be 0, I think it should be 1. Passing in ConvertOptions(null_values=["NULL"]) doesn't help.

       

      Note that pandas.read_csv parses these NULLs correctly so I have a workaround available.

      But I'd prefer to natively read CSV from pyarrow if possible

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              sburns Scott Burns
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h