Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16026 Cost-based Optimizer Framework
  3. SPARK-19408

cardinality estimation involving two columns of the same table

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • Optimizer, SQL
    • None

    Description

      In SPARK-17075, we estimate cardinality of predicate expression "column (op) literal", where op is =, <, <=, >, >= or <=>. In SQL queries, we also see predicate expressions involving two columns such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. Note that, if column-1 and column-2 belong to different tables, then it is a join operator's work, NOT a filter operator's work.

      In this jira, we want to estimate the filter factor of predicate expressions involving two columns of same table. For example, multiple tpc-h queries have this kind of predicate "WHERE l_commitdate < l_receiptdate".

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ron8hu Ron Hu
            ron8hu Ron Hu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment