Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1983

Pool SeekableInputStreams in ParquetFileReader

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • parquet-mr
    • None

    Description

       

      If https://issues.apache.org/jira/browse/PARQUET-1982 goes through, then we could allow parallel reading of row groups with a pool of SeekableInputStreams. This would significantly boost performance for applications that read data at random positions from a large file.

      I've already developed a patch that would enable this functionality. I will link the merge request in the next few days.

      Is there a related ticket that i have overlooked?

      Attachments

        Activity

          People

            Unassigned Unassigned
            fschmalzel Felix Schmalzel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: