Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4625

Parquet Row Group Size optimization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • Impala 2.2.4
    • None
    • Backend
    • None

    Description

      For highly selective query, once we can narrow it down to a single Parquet file that matches the predicate, Impala still has to process that single file. Right now, a single Parquet file generated by Impala has only 1 row group and its default size is 256MB. A row group is processed in serial cannot be parallelized at this moment. Processing a 256MB row group can takes several sec, even for a very simple predicate. To improve the response time of such highly selective query, the row group must be smaller.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alan@cloudera.com Alan Choi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: