Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-593

Archive daemon: infinite loop at midnight

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.5.0
    • Component/s: MR Data Processors
    • Labels:
      None
    • Environment:

      Debian 5.0, Hadoop 0.20

    • Release Note:
      Fixed infinite loop archiving at midnight. (Sourygna Luangsay via Eric Yang)

      Description

      The archive manager Chukwa daemon enters an infinite loop between 24H to 1H. This entails an increase of the namenode load and a huge increase of both chukwa and namenode logs.

      Problem seems to come from the start function of ChukwaArchiveManager.java (in package org/apache/hadoop/chukwa/extraction/archive). At midnight, we get two directories in /chukwa/dataSinkArchives/ (one for the last day and one for the new day). This means that we neither enter the "daysInRawArchiveDir.length == 0" condition nor the "daysInRawArchiveDir.length == 1" one. processDay function is then called but few is done due to "modificationDate < oneHourAgo" condition.
      Finally, we loop without having slept or deleted last day directory. Such process repeats itself during one hour.

      Here is how I propose to change the "daysInRawArchiveDir.length == 1" condition block in the start function:
      148 if (daysInRawArchiveDir.length >= 1 ) {
      149 long nextRun = lastRun + (2*ONE_HOUR) - (1*60*1000);// 2h -1min
      150 if (now < nextRun)

      { 151 log.info("lastRun < 2 hours so skip archive for now, going to sleep for 30 minutes, currentDate is:" + new java.util.Date()); 152 Thread.sleep(30 * 60 * 1000); 153 continue; 154 }

      155 }

      As for me, it removed the infinite loop problem. But maybe there is a reason to separate "1 directory" case from "many directories" case. I've been reading documentation and subversion but could not find it.
      If there is one, could someone explain it to me?

      Regards.

        Attachments

          Activity

            People

            • Assignee:
              eyang Eric Yang
              Reporter:
              sourygna Sourygna Luangsay
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 10m
                10m
                Remaining:
                Remaining Estimate - 10m
                10m
                Logged:
                Time Spent - Not Specified
                Not Specified