Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 2.0.0
    • Component/s: Hive
    • Labels:
      None

      Description

      One of my very complex queries takes about 14 minutes to compile with PPD on. Profiling it I saw a lot of time spent in this stack which is called many many thousands of times.

      java.lang.Throwable.getStackTraceElement(-2)
      java.lang.Throwable.getOurStackTrace(827)
      java.lang.Throwable.getStackTrace(816)
      sun.reflect.GeneratedMethodAccessor5.invoke(-1)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(43)
      java.lang.reflect.Method.invoke(497)
      org.apache.log4j.spi.LocationInfo.<init>(139)
      org.apache.log4j.spi.LoggingEvent.getLocationInformation(253)
      org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(500)
      org.apache.log4j.helpers.PatternConverter.format(65)
      org.apache.log4j.PatternLayout.format(506)
      org.apache.log4j.WriterAppender.subAppend(310)
      org.apache.log4j.DailyRollingFileAppender.subAppend(369)
      org.apache.log4j.WriterAppender.append(162)
      org.apache.log4j.AppenderSkeleton.doAppend(251)
      org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(66)
      org.apache.log4j.Category.callAppenders(206)
      org.apache.log4j.Category.forcedLog(391)
      org.apache.log4j.Category.log(856)
      org.apache.commons.logging.impl.Log4JLogger.info(176)
      org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.logExpr(707)
      org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(752)
      org.apache.hadoop.hive.ql.ppd.OpProcFactory$FilterPPD.process(437)
      

      logExpr is set to log at INFO level, but I think DEBUG is more appropriate. When I set log level to debug I see > 20% speedup in compile time:

      Before:

      real    14m47.972s
      user    15m25.609s
      sys    0m20.282s
      

      After:

      real    11m30.946s
      user    12m10.870s
      sys    0m7.320s
      

      It looks like there's a lot of stuff in the PPD code that could be optimized, when I turn PPD off the query compiles in 2m 30s. But this seems like an easy and low risk win.

        Attachments

        1. HIVE-12312.2.patch
          1 kB
          Carter Shanklin
        2. HIVE-12312.1.patch
          1 kB
          Carter Shanklin

          Activity

            People

            • Assignee:
              cartershanklin Carter Shanklin
              Reporter:
              cartershanklin Carter Shanklin
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: