Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3638

Remove lazy creation of LLVM codegen module

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.2
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      Query

      select *
      FROM (
        SELECT Rank() OVER (
            ORDER BY l_extendedprice
              ,l_quantity
              ,l_discount
              ,l_tax
            ) AS rank
        FROM lineitem
        WHERE l_shipdate < '1992-05-09'
        ) a
      WHERE rank < 10
      

      Plan

      03:SELECT
      |  predicates: rank() < 10
      |  hosts=20 per-host-mem=unavailable
      |  tuple-ids=6,5 row-size=66B cardinality=59999897
      |
      02:ANALYTIC
      |  functions: rank()
      |  order by: l_extendedprice ASC, l_quantity ASC, l_discount ASC, l_tax ASC
      |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
      |  hosts=20 per-host-mem=unavailable
      |  tuple-ids=6,5 row-size=66B cardinality=599998971
      |
      04:MERGING-EXCHANGE [UNPARTITIONED]
      |  order by: l_extendedprice ASC, l_quantity ASC, l_discount ASC, l_tax ASC
      |  hosts=20 per-host-mem=unavailable
      |  tuple-ids=6 row-size=58B cardinality=599998971
      |
      01:SORT
      |  order by: l_extendedprice ASC, l_quantity ASC, l_discount ASC, l_tax ASC
      |  hosts=20 per-host-mem=736.00MB
      |  tuple-ids=6 row-size=58B cardinality=599998971
      |
      00:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
         partitions=1/1 files=880 size=216.61GB
         predicates: l_shipdate < '1992-05-09'
         table stats: 5999989709 rows total
         column stats: all
         hosts=20 per-host-mem=440.00MB
         tuple-ids=0 row-size=58B cardinality=599998971
      

      Fragment not getting codegened

        Fragment start latencies: Count: 20, 25th %-ile: 1ms, 50th %-ile: 2ms, 75th %-ile: 2ms, 90th %-ile: 2ms, 95th %-ile: 3ms, 99.9th %-ile: 3ms
          Per Node Peak Memory Usage: d2406.halxg.cloudera.com:22000(1.38 GB) d2413.halxg.cloudera.com:22000(1.52 GB) d2405.halxg.cloudera.com:22000(1.34 GB) d2414.halxg.cloudera.com:22000(1.51 GB) d2416.halxg.cloudera.com:22000(1.48 GB) d2404.halxg.cloudera.com:22000(1.37 GB) d2420.halxg.cloudera.com:22000(1.40 GB) d2410.halxg.cloudera.com:22000(1.61 GB) d2412.halxg.cloudera.com:22000(1.22 GB) d2419.halxg.cloudera.com:22000(1.38 GB) d2409.halxg.cloudera.com:22000(1.41 GB) d2407.halxg.cloudera.com:22000(1.10 GB) d2411.halxg.cloudera.com:22000(1.34 GB) d2418.halxg.cloudera.com:22000(1.27 GB) d2408.halxg.cloudera.com:22000(1.56 GB) d2421.halxg.cloudera.com:22000(1.35 GB) d2403.halxg.cloudera.com:22000(1.28 GB) d2415.halxg.cloudera.com:22000(1.53 GB) d2417.halxg.cloudera.com:22000(1.31 GB) d2402.halxg.cloudera.com:22000(1.25 GB) 
           - FiltersReceived: 0 (0)
           - FinalizationTimer: 0.000ns
          Coordinator Fragment F01:(Total: 8m48s, non-child: 3.044ms, % non-child: 0.00%)
            MemoryUsage(16s000ms): 16.00 KB, 33.59 MB, 47.13 MB, 47.41 MB, 46.89 MB, 46.71 MB, 46.93 MB, 46.69 MB, 46.52 MB, 46.67 MB, 46.34 MB, 46.76 MB, 46.63 MB, 46.58 MB, 46.44 MB, 46.64 MB, 46.91 MB, 47.02 MB, 46.60 MB, 46.62 MB, 46.65 MB, 46.60 MB, 46.80 MB, 46.88 MB, 46.69 MB, 46.61 MB, 46.88 MB, 46.58 MB, 46.71 MB, 46.76 MB, 46.49 MB, 46.36 MB, 46.07 MB
             - AverageThreadTokens: 0.00 
             - BloomFilterBytes: 0
             - PeakMemoryUsage: 59.87 MB (62781648)
             - PerHostPeakMemUsage: 0
             - PrepareTime: 138.114us
             - RowsProduced: 9 (9)
             - TotalCpuTime: 8m49s
             - TotalNetworkReceiveTime: 0.000ns
             - TotalNetworkSendTime: 0.000ns
             - TotalStorageWaitTime: 0.000ns
            BlockMgr:
               - BlockWritesOutstanding: 0 (0)
               - BlocksCreated: 71 (71)
               - BlocksRecycled: 1.33K (1333)
               - BufferedPins: 0 (0)
               - BytesWritten: 0
               - MaxBlockSize: 8.00 MB (8388608)
               - MemoryLimit: 242.23 GB (260091396096)
               - PeakMemoryUsage: 736.00 MB (771751936)
               - TotalBufferWaitTime: 0.000ns
               - TotalEncryptionTime: 0.000ns
               - TotalIntegrityCheckTime: 0.000ns
               - TotalReadBlockTime: 0.000ns
            SELECT_NODE (id=3):(Total: 8m48s, non-child: 8s327ms, % non-child: 1.57%)
               - PeakMemoryUsage: 9.01 MB (9449472)
               - RowsReturned: 9 (9)
               - RowsReturnedRate: 0
            ANALYTIC_EVAL_NODE (id=2):(Total: 8m40s, non-child: 5m42s, % non-child: 65.78%)
               - EvaluationTime: 5m42s
               - GetNewBlockTime: 5.963ms
               - PeakMemoryUsage: 26.34 MB (27621376)
               - PinTime: 0.000ns
               - RowsReturned: 169.58M (169575660)
               - RowsReturnedRate: 325.75 K/sec
               - UnpinTime: 3.156ms
            EXCHANGE_NODE (id=4):(Total: 2m58s, non-child: 2m35s, % non-child: 87.26%)
              BytesReceived(16s000ms): 0, 26.87 MB, 135.18 MB, 253.93 MB, 373.50 MB, 493.48 MB, 613.34 MB, 733.28 MB, 852.89 MB, 972.60 MB, 1.07 GB, 1.18 GB, 1.30 GB, 1.42 GB, 1.53 GB, 1.65 GB, 1.77 GB, 1.89 GB, 2.00 GB, 2.12 GB, 2.24 GB, 2.35 GB, 2.47 GB, 2.59 GB, 2.70 GB, 2.82 GB, 2.93 GB, 3.05 GB, 3.17 GB, 3.28 GB, 3.40 GB, 3.52 GB, 3.64 GB
               - BytesReceived: 3.70 GB (3972098336)
               - ConvertRowBatchTime: 0.000ns
               - DeserializeRowBatchTimer: 15s052ms
               - FirstBatchArrivalWaitTime: 22s690ms
               - MergeGetNext: 2m35s
               - MergeGetNextBatch: 814.096ms
               - PeakMemoryUsage: 0
               - RowsReturned: 169.58M (169575660)
               - RowsReturnedRate: 952.03 K/sec
               - SendersBlockedTimer: 8m24s
               - SendersBlockedTotalTimer(*): 2h47m
      

      Wrapping the query in a count is enough to codegen the fragment with the analytic function

      select count(*) from (select *
      FROM (
        SELECT Rank() OVER (
            ORDER BY l_extendedprice
              ,l_quantity
              ,l_discount
              ,l_tax
            ) AS rank
        FROM lineitem
        WHERE l_shipdate < '1992-05-09'
        ) a
      WHERE rank < 10)a
      

      Fix should speedup the query by 2x

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kwho Michael Ho
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: