Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5974

[Python][C++] Enable CSV reader to read from concatenated gzip stream

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.13.0, 0.14.0
    • Fix Version/s: 0.15.0
    • Component/s: Python

      Description

      If two gzipped files are concatenated together, the result is a valid gzip file.  However, it appears that pyarrow.csv.read_csv will only read the portion related to the first file.

      If the repro script here is run, the output is:

      $ python repro.py
      pyarrow.csv only reads one row:
      {{ x}}
      0 1
      pandas reads two rows:
      {{ x}}
      0 1
      1 2
      pyarrow version: 0.14.0

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pitrou Antoine Pitrou
                Reporter:
                jordan_samuels Jordan Samuels
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m