Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7028

Reduce the planning time of queries on large Parquet tables with large metadata cache files

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.17.0
    • Metadata

    Description

      If the Parquet table has a large number of small files, the metadata cache files grow larger and the planner tries to read the large metadata cache file which leads to the planning time overhead. Most of the time of execution is spent during the planning phase.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vdonapati Venkata Jyothsna Donapati
            vdonapati Venkata Jyothsna Donapati
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1,284h
                1,284h
                Remaining:
                Remaining Estimate - 1,284h
                1,284h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment