Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7936

Enable better control over Parquet writing

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 3.3.0
    • None
    • None
    • ghx-label-6

    Description

      With the introduction of the Parquet page indexes it became desirable to have more control over how Impala writes Parquet files.

      These configuration options (probably implemented as query options) would be:

      • enable/disable Parquet page index writing (currently we can do it with a command-line argument)
      • set page-size limits based on row count
      • Set truncation length for statistics about string values   (current truncation length is 64, it is unlikely to have user data that needs longer truncation than that)

      They'd enable writing more complete tests for page filtering. They'd be also useful for fine-tuning the page index for better performance.

      Attachments

        Issue Links

          Activity

            People

              boroknagyz Zoltán Borók-Nagy
              boroknagyz Zoltán Borók-Nagy
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: