[ARROW-4143] [Python] Skip rows while reading parquet file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: Developer Tools
Labels:
- newbie

External issue URL:
https://github.com/pandas-dev/pandas/issues/24511

Description

Is there any functionality in pyarrow that allows reading the file partially. Means if I wish to read only the first 10 rows from the parquet file.

I got this situation while doing this:

`df = pd.read_parquet(path= 'filepath', nrows = 10)` #Gave me error

I wanted to read just the 10 rows into pandas dataframe using the read_parquet, (read_parquet uses pyarrow as one of the engines to read parquet file). As the parquet file is considerably huge in size, if one wants to read only a few n rows is there any functionality we can add in the engine to do so?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sanchit

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 01/Jan/19 23:23

Updated:: 11/Jan/23 07:32

Resolved:: 08/Feb/19 05:05