Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5038

File size mismatch in PlannerTest.testPredicatePropagation

    XMLWordPrintableJSON

Details

    Description

      A recent commit broke PlannerTest.testPredicatePropagation. There's a file size mismatch. The tricky part is that we already added code to ignore file size mismatches in IMPALA-2565. However, the code needs to be generalized to take into account differences in the unit, e.g., "B", vs. "KB". See the actual//results:

      Section PLAN of query:
      SELECT count(*) FROM
       (SELECT * from tpch_parquet.customer c CROSS JOIN tpch_parquet.nation n
        WHERE n_name = 'BRAZIL' AND n_regionkey = 1 AND c_custkey % 2 = 0) cn
       LEFT OUTER JOIN tpch_parquet.region r ON n_regionkey = r_regionkey
      
      Actual does not match expected result:
      PLAN-ROOT SINK
      |
      05:AGGREGATE [FINALIZE]
      |  output: count(*)
      |
      04:HASH JOIN [LEFT OUTER JOIN]
      |  hash predicates: n.n_regionkey = r_regionkey
      |
      |--03:SCAN HDFS [tpch_parquet.region r]
      |     partitions=1/1 files=1 size=939B
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |     predicates: r.r_regionkey = 1
      |
      02:NESTED LOOP JOIN [CROSS JOIN]
      |
      |--01:SCAN HDFS [tpch_parquet.nation n]
      |     partitions=1/1 files=1 size=2.25KB
      |     predicates: n_regionkey = 1, n_name = 'BRAZIL'
      |
      00:SCAN HDFS [tpch_parquet.customer c]
         partitions=1/1 files=1 size=12.27MB
         predicates: c_custkey % 2 = 0
      
      Expected:
      PLAN-ROOT SINK
      |
      05:AGGREGATE [FINALIZE]
      |  output: count(*)
      |
      04:HASH JOIN [LEFT OUTER JOIN]
      |  hash predicates: n.n_regionkey = r_regionkey
      |
      |--03:SCAN HDFS [tpch_parquet.region r]
      |     partitions=1/1 files=1 size=1.01KB
      |     predicates: r.r_regionkey = 1
      |
      02:NESTED LOOP JOIN [CROSS JOIN]
      |
      |--01:SCAN HDFS [tpch_parquet.nation n]
      |     partitions=1/1 files=1 size=2.38KB
      |     predicates: n_regionkey = 1, n_name = 'BRAZIL'
      |
      00:SCAN HDFS [tpch_parquet.customer c]
         partitions=1/1 files=1 size=12.27MB
         predicates: c_custkey % 2 = 0
      
      Verbose plan:
      F00:PLAN FRAGMENT [UNPARTITIONED]
        PLAN-ROOT SINK
        |
        05:AGGREGATE [FINALIZE]
        |  output: count(*)
        |  hosts=1 per-host-mem=unavailable
        |  tuple-ids=4 row-size=8B cardinality=1
        |
        04:HASH JOIN [LEFT OUTER JOIN]
        |  hash predicates: n.n_regionkey = r_regionkey
        |  hosts=1 per-host-mem=unavailable
        |  tuple-ids=0,1,3N row-size=35B cardinality=15000
        |
        |--03:SCAN HDFS [tpch_parquet.region r]
        |     partitions=1/1 files=1 size=939B
        |     predicates: r.r_regionkey = 1
        |     table stats: 5 rows total
        |     column stats: all
        |     parquet statistics predicates: r.r_regionkey = 1
        |     parquet dictionary predicates: r.r_regionkey = 1
        |     hosts=1 per-host-mem=unavailable
        |     tuple-ids=3 row-size=2B cardinality=1
        |
        02:NESTED LOOP JOIN [CROSS JOIN]
        |  hosts=1 per-host-mem=unavailable
        |  tuple-ids=0,1 row-size=33B cardinality=15000
        |
        |--01:SCAN HDFS [tpch_parquet.nation n]
        |     partitions=1/1 files=1 size=2.25KB
        |     predicates: n_regionkey = 1, n_name = 'BRAZIL'
        |     table stats: 25 rows total
        |     column stats: all
        |     parquet statistics predicates: n_regionkey = 1, n_name = 'BRAZIL'
        |     parquet dictionary predicates: n_regionkey = 1, n_name = 'BRAZIL'
        |     hosts=1 per-host-mem=unavailable
        |     tuple-ids=1 row-size=25B cardinality=1
        |
        00:SCAN HDFS [tpch_parquet.customer c]
           partitions=1/1 files=1 size=12.27MB
           predicates: c_custkey % 2 = 0
           table stats: 150000 rows total
           column stats: all
           parquet dictionary predicates: c_custkey % 2 = 0
           hosts=1 per-host-mem=unavailable
           tuple-ids=0 row-size=8B cardinality=15000
      

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: