Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3700

[C++] CSV parser should allow ignoring empty lines

    XMLWordPrintableJSON

Details

    Description

      This is a copy/paste of the github issue: https://github.com/apache/arrow/issues/2883

       

      Hi,

      I was playing with pyarrow.csv read_csv and found a rather strange behavior that I'm not sure is normal.

      Parsing will fail if the delimiter of the CSV file is a comma and there's a blank line after the header (see basic_with_blank.csv example)

      Example output:

      {{{{Traceback (most recent call last): File "sorrow.py", line 14, in <module> table = pa_csv.read_csv(csv) File "pyarrow/_csv.pyx", line 198, in pyarrow._csv.read_csv File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: CSV parse error: Expected 2 columns, got 1 }}}}

      If I change the CSV delimiter to semicolon, the error disappears and everything is fine!

      I'm providing python code and CSV samples which compares with pandas (which does not suffer from this).

      Hope this helps, thanks

      Attachments

        1. csv_parse_error.zip
          0.7 kB
          Ultrabug

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              ultrabug Ultrabug
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h