Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2457

Oozie log parsing regex consume more than 90% cpu

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 5.0.0b1
    • None
    • None

    Description

      http-0.0.0.0-4080-26 TID=62215 STATE=RUNNABLE CPU_TIME=1992 (92.59%) USER_TIME=1990 (92.46%) Allocted: 269156584
      java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
      java.util.regex.Pattern$Curly.match(Pattern.java:4132)
      java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
      java.util.regex.Matcher.match(Matcher.java:1221)
      java.util.regex.Matcher.matches(Matcher.java:559)
      org.apache.oozie.util.XLogFilter.matches(XLogFilter.java:136)
      org.apache.oozie.util.TimestampedMessageParser.parseNextLine(TimestampedMessageParser.java:145)
      org.apache.oozie.util.TimestampedMessageParser.increment(TimestampedMessageParser.java:92)

      Regex

      (.* USER\[[^\]]*\] GROUP\[[^\]]*\] TOKEN\[[^\]]*\] APP\[[^\]]*\] JOB\[0000000-150625114739728-oozie-puru-W\] ACTION\[[^\]]*\] .*)
      
      

      For single line parsing we use two regex.
      1.

      public ArrayList<String> splitLogMessage(String logLine) {
              Matcher splitter = SPLITTER_PATTERN.matcher(logLine);
              if (splitter.matches()) {
                  ArrayList<String> logParts = new ArrayList<String>();
                  logParts.add(splitter.group(1));// timestamp
                  logParts.add(splitter.group(2));// log level
                  logParts.add(splitter.group(3));// Log Message
                  return logParts;
              }
              else {
                  return null;
              }
          }
      

      2.

       public boolean matches(ArrayList<String> logParts) {
              if (getStartDate() != null) {
                  if (logParts.get(0).substring(0, 19).compareTo(getFormattedStartDate()) < 0) {
                      return false;
                  }
              }
              String logLevel = logParts.get(1);
              String logMessage = logParts.get(2);
              if (this.logLevels == null || this.logLevels.containsKey(logLevel.toUpperCase())) {
                  Matcher logMatcher = filterPattern.matcher(logMessage);
                  return logMatcher.matches();
              }
              else {
                  return false;
              }
          }
      

      Also there is repetitive parsing for same log message in

      private String parseTimestamp(String line) {
              String timestamp = null;
              ArrayList<String> logParts = filter.splitLogMessage(line);
              if (logParts != null) {
                  timestamp = logParts.get(0);
              }
              return timestamp;
          }
      

      where the line has already parsed using regex and we already know the logParts if any.

      Attachments

        1. OOZIE-2457-7.patch
          15 kB
          Satish Saley
        2. OOZIE-2457-6.patch
          14 kB
          Satish Saley
        3. OOZIE-2457-5.patch
          14 kB
          Satish Saley
        4. OOZIE-2457-4.patch
          13 kB
          Satish Saley
        5. OOZIE-2457-3.patch
          13 kB
          Satish Saley
        6. OOZIE-2457-2.patch
          13 kB
          Satish Saley
        7. OOZIE-2457-1.patch
          13 kB
          Satish Saley

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            satishsaley Satish Saley
            satishsaley Satish Saley
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment