Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4143

[Python] Skip rows while reading parquet file

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • Developer Tools

    Description

      Is there any functionality in pyarrow that allows reading the file partially. Means if I wish to read only the first 10 rows from the parquet file. 

      I got this situation while doing this:

      `df = pd.read_parquet(path= 'filepath', nrows = 10)`  #Gave me error

      I wanted to read just the 10 rows into pandas dataframe using the read_parquet, (read_parquet uses pyarrow as one of the engines to read parquet file). As the parquet file is considerably huge in size, if one wants to read only a few n rows is there any functionality we can add in the engine to do so?

      Attachments

        Activity

          People

            Unassigned Unassigned
            sanchit089 Sanchit
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: