Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6101

Optimize Implicit Columns Processing

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Problem Description -

      • Apache Drill allows users to specify columns even for SELECT STAR queries
      • From my discussion with Paul Rogers, Apache Calcite has a limitation where the, extra columns are not provided
      • The workaround has been to always include all implicit columns for¬†SELECT STAR queries
      • Unfortunately, the current implementation is very inefficient as implicit column values get duplicated; this leads to substantial performance degradation when the number of rows are large

      Suggested Optimization -

      • The NullableVarChar vector should be enhanced to efficiently store duplicate values
      • This will not only address the current Calcite limitations (for SELECT STAR queries) but also optimize all queries with implicit columns

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sachouche Salim Achouche
            sachouche Salim Achouche
            Timothy Farkas Timothy Farkas
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment