Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-494

Make Parquet block size configurable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.1
    • Impala 1.2
    • None
    • None

    Description

      The 1GB parquet block size restricts the degree of parallelism during scan. For example, if I've a 1GB file and I'm querying 75% of the columns, then it'll have to do scan 750MB using 1 disk. On the other hand, if I'm using Seq/Snappy with 128Mb block size, I can parallelize the scan and get the result a lot faster.

      Nong and I discussed this problem and a user-configurable block size came to our mind. It still require some more thought on this problem.

      Attachments

        Activity

          People

            nong_impala_60e1 Nong Li
            alan@cloudera.com Alan Choi
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: