Uploaded image for project: 'Chukwa (retired)'
  1. Chukwa (retired)
  2. CHUKWA-323

Chukwa agent unable to stream all data source on the jobtracker node

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Invalid
    • 0.2.0
    • 0.2.0
    • Data Collection
    • None
    • Redhat EL 5.1, Java 6

    Description

      HDFS namenode and mapreduce related metrics seem to stop sending data since 06/21/2009 00:00:00.
      Agent log contains exceptions like these:

      2009-06-21 21:28:01,165 WARN Thread-10 FileTailingAdaptor - failure reading
      /usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0351_user_Chukwa-Demux_20090620_09_56
      java.io.FileNotFoundException:
      /usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0351_user_Chukwa-Demux_20090620_09_56
      (Too many open files)
      at java.io.RandomAccessFile.open(Native Method)
      at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
      at
      org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.FileTailingAdaptor.tailFile(FileTailingAdaptor.java:239)
      at org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.FileTailer.run(FileTailer.java:90)
      2009-06-21 21:28:01,165 WARN Thread-10 FileTailingAdaptor - Adaptor|58fb855b5c26d36cc1e69e264ce3402c| file:
      /usr/local/hadoop/var/log/history/host.example.com_1245463671645_job_200906200207_0352_user_PigLatin%3AHadoop_jvm_metrics.pig,
      has rotated and no detection - reset counters to 0L

      It looks like the number of file offset tracking pointers exceeded the jvm concurrent number of files open. Which
      triggers a feedback loop that FileTailingAdaptor assuming log file had rotated, but it wasn't the case.
      FileTailingAdaptor was simply unable to track the offset that's all.

      [root@gsbl80211 log]# /usr/sbin/lsof -p 29960|wc -l
      1084

      The concurrent # of open file is 1084 which exceeded the default limit 1024 of concurrent open files.

      Attachments

        1. testForLeaks.patch
          11 kB
          Ariel Shemaiah Rabkin

        Activity

          People

            jboulon Jerome Boulon
            eyang Eric Yang
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: