Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12312

Excessive logging in PPD code

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.2.1
    • 2.0.0
    • Hive
    • None

    Description

      One of my very complex queries takes about 14 minutes to compile with PPD on. Profiling it I saw a lot of time spent in this stack which is called many many thousands of times.

      java.lang.Throwable.getStackTraceElement(-2)
      java.lang.Throwable.getOurStackTrace(827)
      java.lang.Throwable.getStackTrace(816)
      sun.reflect.GeneratedMethodAccessor5.invoke(-1)
      sun.reflect.DelegatingMethodAccessorImpl.invoke(43)
      java.lang.reflect.Method.invoke(497)
      org.apache.log4j.spi.LocationInfo.<init>(139)
      org.apache.log4j.spi.LoggingEvent.getLocationInformation(253)
      org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(500)
      org.apache.log4j.helpers.PatternConverter.format(65)
      org.apache.log4j.PatternLayout.format(506)
      org.apache.log4j.WriterAppender.subAppend(310)
      org.apache.log4j.DailyRollingFileAppender.subAppend(369)
      org.apache.log4j.WriterAppender.append(162)
      org.apache.log4j.AppenderSkeleton.doAppend(251)
      org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(66)
      org.apache.log4j.Category.callAppenders(206)
      org.apache.log4j.Category.forcedLog(391)
      org.apache.log4j.Category.log(856)
      org.apache.commons.logging.impl.Log4JLogger.info(176)
      org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.logExpr(707)
      org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(752)
      org.apache.hadoop.hive.ql.ppd.OpProcFactory$FilterPPD.process(437)
      

      logExpr is set to log at INFO level, but I think DEBUG is more appropriate. When I set log level to debug I see > 20% speedup in compile time:

      Before:

      real    14m47.972s
      user    15m25.609s
      sys    0m20.282s
      

      After:

      real    11m30.946s
      user    12m10.870s
      sys    0m7.320s
      

      It looks like there's a lot of stuff in the PPD code that could be optimized, when I turn PPD off the query compiles in 2m 30s. But this seems like an easy and low risk win.

      Attachments

        1. HIVE-12312.2.patch
          1 kB
          Carter Shanklin
        2. HIVE-12312.1.patch
          1 kB
          Carter Shanklin

        Activity

          People

            cartershanklin Carter Shanklin
            cartershanklin Carter Shanklin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: