[ARROW-5974] [Python][C++] Enable CSV reader to read from concatenated gzip stream - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.13.0, 0.14.0
Fix Version/s: 0.15.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/22382

Description

If two gzipped files are concatenated together, the result is a valid gzip file. However, it appears that pyarrow.csv.read_csv will only read the portion related to the first file.

If the repro script here is run, the output is:

$ python repro.py
pyarrow.csv only reads one row:
{{ x}}
0 1
pandas reads two rows:
{{ x}}
0 1
1 2
pyarrow version: 0.14.0

Attachments

Issue Links

links to

GitHub Pull Request #4923

Activity

People

Assignee:: Antoine Pitrou

Reporter:: Jordan Samuels

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 18/Jul/19 03:12

Updated:: 11/Jan/23 07:44

Resolved:: 01/Aug/19 14:04

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 50m