Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16872

[Python] open_csv throws ArrowInvalid if csv does not end with a new line and is above 16384 lines

    XMLWordPrintableJSON

Details

    Description

      `pyarrow.csv.open_csv` throws ArrowInvalid if csv does not end with a new line and is above 16384 lines. Tested with both pyarrow 7.0.0 and 8.0.0. Error seen both in production app and on developer laptop.

       

      Here's a minimal case for reproducing the issue:

      ```python

      import pyarrow as pa

      import pyarrow.csv

      from io import BytesIO

      for _ in pa.csv.open_csv(BytesIO('\n'.join(['review_id,filter_outcome'] + ['62593aaec7628b203bad4c6e,fabrication']*16385).encode())): pass

      ```

       

      Error is thrown: 

      ArrowInvalid: CSV parse error: Expected 2 columns, got 1: 

      Attachments

        Issue Links

          Activity

            People

              yibocai Yibo Cai
              frederikfab Frederik Fabritius
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h