Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-440

Custom processor classes not detected unless added to chukwa-core jar

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Data Processors
    • Labels:
      None

      Description

      http://wiki.apache.org/hadoop/DemuxModification

      After implementing a custom parser as shown in the wiki link above and mapping it to a data type in chukwa-demux.xml, there's no easy way to register the class so it gets included in the job that's submitted to the hadoop cluster. I've added my jar containing the class to the lib/ directory of the chukwa data processor install and verified that it's in the classapath, but it's not submitted with the Hadoop job. On the Hadoop cluster ClassNotFoundExceptions are thrown in my mapper.

      The only way I've been able to make this work is to do the following:
      1. put my class in the package org.apache.hadoop.chukwa.extraction.demux.processor.mapper
      2. then manually add that class to the chukwa-core-0.3.0.jar that is on my data processor

      Instead, the class should be detected from whatever jar it lives in that's in the demux class path, regardless of the package it lives in.

      1. CHUKWA-440.patch
        2 kB
        Eric Yang
      2. CHUKWA-440-1.patch
        4 kB
        Eric Yang
      3. CHUKWA-440-2.patch
        4 kB
        Eric Yang

        Activity

        Hide
        tanjiaqi Jiaqi Tan added a comment -

        I believe in general, if a Hadoop MapReduce program is going to use class files from multiple jars, then the additional jar files need to be loaded using DistributedCache, and the corresponding Path added to the classpath to force them to be copied to every slave node that will run the code.

        The way to get the functionality you're suggesting, I believe, would be to modify the Demux class to take in a list of extra jars and load them using DistributedCache.

        Show
        tanjiaqi Jiaqi Tan added a comment - I believe in general, if a Hadoop MapReduce program is going to use class files from multiple jars, then the additional jar files need to be loaded using DistributedCache, and the corresponding Path added to the classpath to force them to be copied to every slave node that will run the code. The way to get the functionality you're suggesting, I believe, would be to modify the Demux class to take in a list of extra jars and load them using DistributedCache.
        Hide
        jboulon Jerome Boulon added a comment -

        It's an Hadoop problem. Hadoop is running the demux using several machines so the jar needs to be available on all data nodes.
        The way to use more than one jar is to add the -libjars option to your hadoop command, you can try that for Demux.
        Let me know if it's working or not.

        Show
        jboulon Jerome Boulon added a comment - It's an Hadoop problem. Hadoop is running the demux using several machines so the jar needs to be available on all data nodes. The way to use more than one jar is to add the -libjars option to your hadoop command, you can try that for Demux. Let me know if it's working or not.
        Hide
        eyang Eric Yang added a comment -

        Line 317 of DemuxManager.java could add the additional jar files. We probably should put some logic to construct libjars by searching for CHUKWA_HOME/lib/demux/*.jar.

        Show
        eyang Eric Yang added a comment - Line 317 of DemuxManager.java could add the additional jar files. We probably should put some logic to construct libjars by searching for CHUKWA_HOME/lib/demux/*.jar.
        Hide
        jboulon Jerome Boulon added a comment -

        -1 - Again this should not done by manually adding jars.
        There's a generic way to do that in hadoop world using the -libjars flag, there's no point of adding chukwa specific code here, the problem has already been solved in hadoop.
        So instead of adding some chukwa specific code to add the jar, we should enhance the demux script to look for additional jars and use the -libjars flag.

        if the -libjars is not working for demux that's another issue but since we are implementing Tools it should already be supported.

        Show
        jboulon Jerome Boulon added a comment - -1 - Again this should not done by manually adding jars. There's a generic way to do that in hadoop world using the -libjars flag, there's no point of adding chukwa specific code here, the problem has already been solved in hadoop. So instead of adding some chukwa specific code to add the jar, we should enhance the demux script to look for additional jars and use the -libjars flag. if the -libjars is not working for demux that's another issue but since we are implementing Tools it should already be supported.
        Hide
        eyang Eric Yang added a comment -

        As I recall, the demux has been changed from command line submission to java deamon process. Demux job is submitted through ToolRunner. Hence, it is most likely to be required to modify the Demux Manager code. I am not 100% sure. Interested party should try both.

        Show
        eyang Eric Yang added a comment - As I recall, the demux has been changed from command line submission to java deamon process. Demux job is submitted through ToolRunner. Hence, it is most likely to be required to modify the Demux Manager code. I am not 100% sure. Interested party should try both.
        Hide
        tanjiaqi Jiaqi Tan added a comment -

        @Jerome: If you use libjars, does that automatically copy the extra libs to every node? I don't believe it does, because I had problems with that the last time I tried, and similarly to Bill, I got it to work only by having everything in the main Chukwa package.

        Show
        tanjiaqi Jiaqi Tan added a comment - @Jerome: If you use libjars, does that automatically copy the extra libs to every node? I don't believe it does, because I had problems with that the last time I tried, and similarly to Bill, I got it to work only by having everything in the main Chukwa package.
        Hide
        jboulon Jerome Boulon added a comment -

        So if it's not working for Chukwa we need to fix it and I want to add the flag at the command line level since I'm no longer using DemuxManager because of other issues on S3, if we fix it inside DemuxManager then everyone that is only using Demux will still have the issue.

        I'll do some testing but I don't have time right now, so if someone can test the libjar flag with a standard hadoop job that will be a good start.

        Show
        jboulon Jerome Boulon added a comment - So if it's not working for Chukwa we need to fix it and I want to add the flag at the command line level since I'm no longer using DemuxManager because of other issues on S3, if we fix it inside DemuxManager then everyone that is only using Demux will still have the issue. I'll do some testing but I don't have time right now, so if someone can test the libjar flag with a standard hadoop job that will be a good start.
        Hide
        eyang Eric Yang added a comment -

        Converted demux to use mapred.used.genericoptionsparser flag to enable -libjars abc.jar,xyz.jar.

        This change enable the ability to pass in -libjars for standalone demux, and demux running through demux manager.
        Place the parser jar file in CHUKWA_HOME/lib/demux.

        Yahoo is shutdown over this weekend, hence I can't test this code on my cluster. Anyone that could help in testing this code is highly appreciated.

        Show
        eyang Eric Yang added a comment - Converted demux to use mapred.used.genericoptionsparser flag to enable -libjars abc.jar,xyz.jar. This change enable the ability to pass in -libjars for standalone demux, and demux running through demux manager. Place the parser jar file in CHUKWA_HOME/lib/demux. Yahoo is shutdown over this weekend, hence I can't test this code on my cluster. Anyone that could help in testing this code is highly appreciated.
        Hide
        eyang Eric Yang added a comment -

        Tested on my cluster, the current patch is incomplete.

        Show
        eyang Eric Yang added a comment - Tested on my cluster, the current patch is incomplete.
        Hide
        eyang Eric Yang added a comment -

        Using DistributedCache to load additional jar files. The idea is to put add on jar file in hdfs /chukwa/demux, and they will get loaded automatically. However, I can't get it to work. Jiaqi, could you review what is missing?

        Show
        eyang Eric Yang added a comment - Using DistributedCache to load additional jar files. The idea is to put add on jar file in hdfs /chukwa/demux, and they will get loaded automatically. However, I can't get it to work. Jiaqi, could you review what is missing?
        Hide
        eyang Eric Yang added a comment -

        Revisied patch. This is the same as previous patch. It was working except I wasn't packaging Chukwa correctly during my test run. Additional parsers could be place in HDFS://hostname:port/chukwa/demux/*.jar and demux will pick it up.

        Show
        eyang Eric Yang added a comment - Revisied patch. This is the same as previous patch. It was working except I wasn't packaging Chukwa correctly during my test run. Additional parsers could be place in HDFS://hostname:port/chukwa/demux/*.jar and demux will pick it up.
        Hide
        asrabkin Ari Rabkin added a comment -

        +1 to latest patch.

        Show
        asrabkin Ari Rabkin added a comment - +1 to latest patch.
        Hide
        eyang Eric Yang added a comment -

        I just committed this, thanks Ari.

        Show
        eyang Eric Yang added a comment - I just committed this, thanks Ari.
        Hide
        hudson Hudson added a comment -
        Show
        hudson Hudson added a comment - Integrated in Chukwa-trunk #330 (See http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/330/ )

          People

          • Assignee:
            eyang Eric Yang
            Reporter:
            billgraham Bill Graham
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development