Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1728

sub-query with duplicate values used IN conditional operator should discard the duplicate values before applying the operator

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • Impala 2.0, Impala 2.1
    • None
    • Frontend

    Description

      When running the TPC-DS Q95 we found that it usages a result of CTE in IN conditional later in query.
      In this case CTE generates too many duplicate values for the same column which is used in conditional. When applied the DISTINCT to CTE it took 40% less time to complete.
      The timings(in Sec.) are as:
      Without DISTINCT : 1240
      With DISTINCT : 728

      Both versions of the query are attached.

      Attachments

        1. q95.sql
          1 kB
          Dileep Kumar
        2. q95.sql.DISTINCT
          1 kB
          Dileep Kumar

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dkumar@cloudera.com Dileep Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: