Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1983

Pool SeekableInputStreams in ParquetFileReader

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: parquet-mr
    • Labels:
      None

      Description

       

      If https://issues.apache.org/jira/browse/PARQUET-1982 goes through, then we could allow parallel reading of row groups with a pool of SeekableInputStreams. This would significantly boost performance for applications that read data at random positions from a large file.

      I've already developed a patch that would enable this functionality. I will link the merge request in the next few days.

      Is there a related ticket that i have overlooked?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              fschmalzel Felix Schmalzel

              Dates

              • Created:
                Updated:

                Issue deployment