Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4130

Severe lock contention introduced by Log4j causing severe slow down in planning and other phases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • Impala 2.5.0
    • None
    • Frontend

    Description

      When running high concurrency workload I noticed that query planning is taking unproportionally longer than for single user.

      Upon profiling the Planner I noticed that lots of threads are contending on a lock in Log4j.
      By default Log4j synchronizes writes which via a lock which causes this contention.

      With default Impala configuration the planner and other components write lots of info/warn messages to the log which causes heavy contention under concurrency.

      By changing "Verbose Log Level" to 0 and "Logging Threshold" to fatal concurrent queries used in the test got a 50% speedup.

      Planner timeline with default settings

          Planner Timeline: 51s007ms
             - Analysis finished: 6s020ms (6s020ms)
             - Equivalence classes computed: 6s224ms (203.960ms)
             - Single node plan created: 39s137ms (32s912ms)
             - Runtime filters computed: 39s138ms (1.798ms)
             - Distributed plan created: 50s659ms (11s520ms)
             - Lineage info computed: 50s660ms (1.157ms)
             - Planning finished: 51s007ms (346.339ms)
      

      Planner timeline with logging disable

          Planner Timeline: 161.667ms
             - Analysis finished: 26.854ms (26.854ms)
             - Equivalence classes computed: 27.760ms (906.310us)
             - Single node plan created: 145.224ms (117.463ms)
             - Runtime filters computed: 146.017ms (792.772us)
             - Distributed plan created: 148.244ms (2.227ms)
             - Lineage info computed: 149.043ms (798.857us)
             - Planning finished: 161.667ms (12.623ms)
      
      Stack Trace Count Duration
      org.apache.log4j.Category.callAppenders(LoggingEvent) 91,612 32,628,691,909,737
      org.apache.log4j.Category.forcedLog(String, Priority, Object, Throwable) 91,612 32,628,691,909,737
      org.apache.log4j.Category.log(String, Priority, Object, Throwable) 91,612 32,628,691,909,737
      org.slf4j.impl.Log4jLoggerAdapter.debug(String) 66,573 23,520,171,003,106
      com.cloudera.impala.planner.HdfsScanNode.computeStats(Analyzer) 16,545 5,762,818,131,400
      com.cloudera.impala.planner.DistributedPlanner.createHashJoinFragment(HashJoinNode, PlanFragment, PlanFragment, long, ArrayList) 15,689 5,615,955,980,208
      com.cloudera.impala.planner.JoinNode.computeStats(Analyzer) 12,864 4,716,775,262,309
      com.cloudera.impala.planner.HdfsScanNode.computeNumNodes(Analyzer, long) 5,528 1,888,005,994,277
      com.cloudera.impala.planner.SingleNodePlanner.createJoinPlan(Analyzer, TableRef, List, List) 5,212 1,818,139,839,257
      com.cloudera.impala.analysis.SelectStmt.analyze(Analyzer) 2,754 953,068,475,018
      com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(Analyzer) 2,755 950,232,280,428
      com.cloudera.impala.analysis.AggregateInfo.create(ArrayList, ArrayList, TupleDescriptor, Analyzer) 2,745 945,034,393,921
      com.cloudera.impala.service.Frontend.createExecRequest(TQueryCtx, StringBuilder) 651 224,074,644,208
      com.cloudera.impala.planner.Planner.createPlan() 449 155,274,962,910
      com.cloudera.impala.planner.Planner.computeResourceReqs(List, boolean, TQueryExecRequest) 419 154,737,940,541
      com.cloudera.impala.planner.DistributedPlanner.createPlanFragments(PlanNode) 408 146,583,876,650
      com.cloudera.impala.service.Frontend.analyzeStmt(TQueryCtx) 312 101,646,742,787
      com.cloudera.impala.service.JniFrontend.createExecRequest(byte[]) 211 76,250,810,597
      com.cloudera.impala.planner.UnionNode.computeStats(Analyzer) 31 11,571,668,595
      org.slf4j.impl.Log4jLoggerAdapter.warn(String) 23,955 8,755,149,728,646
      com.cloudera.impala.planner.PlanNode.multiplyCardinalities(long, long) 23,955 8,755,149,728,646
      org.slf4j.impl.Log4jLoggerAdapter.info(String, Object[]) 1,084 353,371,177,985

      Possible workaround
      https://logging.apache.org/log4j/2.x/manual/async.html

      JMC profile attached.

      Attachments

        1. IMPALA-4130-profile.txt
          503 kB
          Mostafa Mokhtar
        2. slow_planning_profile_1.jfr.zip
          2.30 MB
          Mostafa Mokhtar

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: