Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3940

Different query plans produced when accessing base tables or identical views.

    XMLWordPrintableJSON

Details

    Description

      There seems to be a bug related to join cardinality estimation when views are involved.

      Different plans are produced when running the same query against either base tables or identical views. The view plan is worse and the join-cardinality estimates in the view plan are consistent with not being able to find the column stats from the view's base table columns.

      Plan against base tables:

      select count(a.int_col) from
      functional.alltypessmall a
      inner join functional.alltypes b on (a.id = b.id)
      inner join functional.alltypestiny c on (b.id = c.id)
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=170.00MB VCores=4 |
      |                                                           |
      | 10:AGGREGATE [FINALIZE]                                   |
      | |  output: count:merge(a.int_col)                         |
      | |                                                         |
      | 09:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 05:AGGREGATE                                              |
      | |  output: count(a.int_col)                               |
      | |                                                         |
      | 04:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: b.id = a.id                           |
      | |  runtime filters: RF000 <- a.id                         |
      | |                                                         |
      | |--08:EXCHANGE [HASH(a.id)]                               |
      | |  |                                                      |
      | |  00:SCAN HDFS [functional.alltypessmall a]              |
      | |     partitions=4/4 files=4 size=6.32KB                  |
      | |                                                         |
      | 07:EXCHANGE [HASH(b.id)]                                  |
      | |                                                         |
      | 03:HASH JOIN [INNER JOIN, BROADCAST]                      |
      | |  hash predicates: b.id = c.id                           |
      | |  runtime filters: RF001 <- c.id                         |
      | |                                                         |
      | |--06:EXCHANGE [BROADCAST]                                |
      | |  |                                                      |
      | |  02:SCAN HDFS [functional.alltypestiny c]               |
      | |     partitions=4/4 files=4 size=460B                    |
      | |     runtime filters: RF000 -> c.id                      |
      | |                                                         |
      | 01:SCAN HDFS [functional.alltypes b]                      |
      |    partitions=24/24 files=24 size=478.45KB                |
      |    runtime filters: RF000 -> b.id, RF001 -> b.id          |
      +-----------------------------------------------------------+
      

      Plan against views created with CREATE VIEW as SELECT * FROM basetable:

      select count(a.int_col) from
      alltypessmall_view a
      inner join alltypes_view b on (a.id = b.id)
      inner join alltypestiny_view c on (b.id = c.id);
      +--------------------------------------------------------------------------------------+
      | Explain String                                                                       |
      +--------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=170.00MB VCores=3                            |
      |                                                                                      |
      | 09:AGGREGATE [FINALIZE]                                                              |
      | |  output: count:merge(a.int_col)                                                    |
      | |                                                                                    |
      | 08:EXCHANGE [UNPARTITIONED]                                                          |
      | |                                                                                    |
      | 05:AGGREGATE                                                                         |
      | |  output: count(functional.alltypessmall.int_col)                                   |
      | |                                                                                    |
      | 04:HASH JOIN [INNER JOIN, BROADCAST]                                                 |
      | |  hash predicates: functional.alltypes.id = functional.alltypestiny.id              |
      | |  runtime filters: RF000 <- functional.alltypestiny.id                              |
      | |                                                                                    |
      | |--07:EXCHANGE [BROADCAST]                                                           |
      | |  |                                                                                 |
      | |  02:SCAN HDFS [functional.alltypestiny]                                            |
      | |     partitions=4/4 files=4 size=460B                                               |
      | |                                                                                    |
      | 03:HASH JOIN [INNER JOIN, BROADCAST]                                                 |
      | |  hash predicates: functional.alltypes.id = functional.alltypessmall.id             |
      | |  runtime filters: RF001 <- functional.alltypessmall.id                             |
      | |                                                                                    |
      | |--06:EXCHANGE [BROADCAST]                                                           |
      | |  |                                                                                 |
      | |  00:SCAN HDFS [functional.alltypessmall]                                           |
      | |     partitions=4/4 files=4 size=6.32KB                                             |
      | |     runtime filters: RF000 -> functional.alltypessmall.id                          |
      | |                                                                                    |
      | 01:SCAN HDFS [functional.alltypes]                                                   |
      |    partitions=24/24 files=24 size=478.45KB                                           |
      |    runtime filters: RF000 -> functional.alltypes.id, RF001 -> functional.alltypes.id |
      +--------------------------------------------------------------------------------------+
      

      This is most likely regression from Impala 2.5, possibly introduced by IMPALA-976 (not yet confirmed).

      Attachments

        1. 3940_simple_query_using_base_tables.txt
          3 kB
          Manikandan R
        2. 3940_simple_query_using_views.txt
          4 kB
          Manikandan R
        3. 3940_complex_query_using_base_tables.txt
          14 kB
          Manikandan R
        4. 3940_complex_query_using_views.txt
          19 kB
          Manikandan R

        Activity

          People

            alex.behm Alexander Behm
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: