Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-674

Add an abstraction to get the length of a stream

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9.0, 1.8.2
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      PARQUET-400 introduces SeekableInputStream to wrap Hadoop v1 and v2 streams and provide ByteBuffer access transparently. This can also be used as an abstraction to allow Parquet to work without the Hadoop API. The missing component is an abstraction that knows how long the file stream is for reading the footer. This could be done by adding a getLength method to the new stream interface, but I think there is value in adding a higher-level abstraction that carries information about the file and can open streams for it. This abstraction could be passed to a PageReadStore, which could have more complicated logic including parallel streams to read column chunks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                rdblue Ryan Blue
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: