Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7936

Enable better control over Parquet writing

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 3.3.0
    • None
    • None
    • ghx-label-6

    Description

      With the introduction of the Parquet page indexes it became desirable to have more control over how Impala writes Parquet files.

      These configuration options (probably implemented as query options) would be:

      • enable/disable Parquet page index writing (currently we can do it with a command-line argument)
      • set page-size limits based on row count
      • Set truncation length for statistics about string values   (current truncation length is 64, it is unlikely to have user data that needs longer truncation than that)

      They'd enable writing more complete tests for page filtering. They'd be also useful for fine-tuning the page index for better performance.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment