Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-207

ParquetInputSplit end calculation bug

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.6.0
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      The calculation for end of a split using the file metadata is broken by PARQUET-108. The calculation was updated to use the requested schema so that the end of a block would be the end of the last projected column. But the end logic actually calculates the total number of bytes that are selected.

      The end of a split is only used to select row groups when a block has no row group offsets, which doesn't happen when the constructor that uses the broken method is called. However, this should still be removed.

      After 1.6.0, I want to move Hive to pass FileSplits directly rather than wrapping them in ParquetInputSplit. The internal reader code can handle mapping row groups to splits because it needs to for PARQUET-84.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rdblue Ryan Blue
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: