Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5974

[Python][C++] Enable CSV reader to read from concatenated gzip stream

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.13.0, 0.14.0
    • 0.15.0
    • Python

    Description

      If two gzipped files are concatenated together, the result is a valid gzip file.  However, it appears that pyarrow.csv.read_csv will only read the portion related to the first file.

      If the repro script here is run, the output is:

      $ python repro.py
      pyarrow.csv only reads one row:
      {{ x}}
      0 1
      pandas reads two rows:
      {{ x}}
      0 1
      1 2
      pyarrow version: 0.14.0

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              jordan_samuels Jordan Samuels
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m