Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1728

sub-query with duplicate values used IN conditional operator should discard the duplicate values before applying the operator

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.0, Impala 2.1
    • Fix Version/s: None
    • Component/s: Frontend

      Description

      When running the TPC-DS Q95 we found that it usages a result of CTE in IN conditional later in query.
      In this case CTE generates too many duplicate values for the same column which is used in conditional. When applied the DISTINCT to CTE it took 40% less time to complete.
      The timings(in Sec.) are as:
      Without DISTINCT : 1240
      With DISTINCT : 728

      Both versions of the query are attached.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dkumar@cloudera.com Dileep Kumar

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment