Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6964

Track stats about column and page sizes in Parquet reader

    XMLWordPrintableJSON

    Details

      Description

      It would be good to have stats for scanned parquet data about page sizes. We currently can't tell much about the "shape" of the parquet pages from the profile. Some questions that are interesting:

      • How big is each column? I.e. total compressed and decompressed size read.
      • How big are pages on average? Either compressed or decompressed size
      • What is the compression ratio for pages? Could be inferred from the above two.

      I think storing all the stats in the profile per-column would be too much data, but we could probably infer most useful things from higher-level aggregates.

        Attachments

          Activity

            People

            • Assignee:
              stakiar Sahil Takiar
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: