Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1728

sub-query with duplicate values used IN conditional operator should discard the duplicate values before applying the operator

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.0, Impala 2.1
    • Fix Version/s: None
    • Component/s: Frontend

      Description

      When running the TPC-DS Q95 we found that it usages a result of CTE in IN conditional later in query.
      In this case CTE generates too many duplicate values for the same column which is used in conditional. When applied the DISTINCT to CTE it took 40% less time to complete.
      The timings(in Sec.) are as:
      Without DISTINCT : 1240
      With DISTINCT : 728

      Both versions of the query are attached.

        Attachments

        1. q95.sql
          1 kB
          Dileep Kumar
        2. q95.sql.DISTINCT
          1 kB
          Dileep Kumar

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dkumar@cloudera.com Dileep Kumar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: