Chukwa
  1. Chukwa
  2. CHUKWA-473

Make default processor configurable

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Default map and reduce processors are configurable.

      Description

      The default processor is hard coded to DefaultProcessor in the Demux class. This should be configurable.

      1. CHUKWA-473.4.patch
        15 kB
        Bill Graham
      2. CHUKWA-473.3.patch
        15 kB
        Bill Graham
      3. ASF.LICENSE.NOT.GRANTED--CHUKWA-473.2.patch
        12 kB
        Bill Graham
      4. ASF.LICENSE.NOT.GRANTED--CHUKWA-473.1.patch
        16 kB
        Bill Graham

        Activity

        Hide
        Jerome Boulon added a comment -

        Bill,
        At Netflix I've changed a couple things and I have a separate config for mapper and reducer.
        So could we do something like:

        <property>
         <name>chukwa.demux.processor.defaultMapper</name>
         <value>org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor</value>
        </property>

        <property>
         <name>chukwa.demux.processor.defaultReducer</name>
         <value>org.apache.hadoop.chukwa.extraction.demux.processor.reducer.DefaultReducer</value>
        </property>

        Let me know if you need some guideline on how todo that.

        Show
        Jerome Boulon added a comment - Bill, At Netflix I've changed a couple things and I have a separate config for mapper and reducer. So could we do something like: <property>  <name>chukwa.demux.processor.defaultMapper</name>  <value>org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor</value> </property> <property>  <name>chukwa.demux.processor.defaultReducer</name>  <value>org.apache.hadoop.chukwa.extraction.demux.processor.reducer.DefaultReducer</value> </property> Let me know if you need some guideline on how todo that.
        Hide
        Bill Graham added a comment -

        This seems like a different use case to me. The existing Demux MapClass and ReduceClass work fine for me, but I just want to use a different processor in the mapper. I don't need to swap out the entire mapper/reducer class. In particular I want to swap out the following section in Demux.MapClass.map(..):

        String processorClass = jobConf.get(chunk.getDataType(),
              "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor");
        

        with something like this:

        String defaultProcessor = jobConf.get("chukwa.demux.default.map.processor",
            "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor");
        
        String processorClass = jobConf.get(chunk.getDataType(), defaultProcessor);
        
        Show
        Bill Graham added a comment - This seems like a different use case to me. The existing Demux MapClass and ReduceClass work fine for me, but I just want to use a different processor in the mapper. I don't need to swap out the entire mapper/reducer class. In particular I want to swap out the following section in Demux.MapClass.map(..): String processorClass = jobConf.get(chunk.getDataType(), "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor"); with something like this: String defaultProcessor = jobConf.get("chukwa.demux.default.map.processor", "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor"); String processorClass = jobConf.get(chunk.getDataType(), defaultProcessor);
        Hide
        Jerome Boulon added a comment -

        I'm not talking about changing the Mapper, I want the same thing as you want, aka running another processor on the mapper side and I want to be able to do the same thing on the reducer side.
        So instead of:
        String defaultProcessor = jobConf.get("chukwa.demux.default.map.processor",
        "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor");

        I was thinking of something like this (it's just another name for the config key):
        String defaultProcessor = jobConf.get("chukwa.demux.processor.defaultMapper",
        "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor");

        Show
        Jerome Boulon added a comment - I'm not talking about changing the Mapper, I want the same thing as you want, aka running another processor on the mapper side and I want to be able to do the same thing on the reducer side. So instead of: String defaultProcessor = jobConf.get("chukwa.demux.default.map.processor", "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor"); I was thinking of something like this (it's just another name for the config key): String defaultProcessor = jobConf.get("chukwa.demux.processor.defaultMapper", "org.apache.hadoop.chukwa.extraction.demux.processor.mapper.DefaultProcessor");
        Hide
        Bill Graham added a comment -

        OK cool, we're talking about the same functionality. Regarding the naming then, what was causing me confusion that your config name ends with 'defaultMapper'. It makes it look like the config takes a Mapper class. I think it would be more clear if we used language that refers to Processors or MapProcessors, not Mappers. This is because the thing that we're configuring is actually an instance of org.apache.hadoop.chukwa.extraction.demux.processor.mapper.MapProcessor, not

        {Mapper}

        .

        What about chukwa.demux.mapper.default.processor?

        Show
        Bill Graham added a comment - OK cool, we're talking about the same functionality. Regarding the naming then, what was causing me confusion that your config name ends with 'defaultMapper'. It makes it look like the config takes a Mapper class. I think it would be more clear if we used language that refers to Processors or MapProcessors, not Mappers. This is because the thing that we're configuring is actually an instance of org.apache.hadoop.chukwa.extraction.demux.processor.mapper.MapProcessor , not {Mapper} . What about chukwa.demux.mapper.default.processor ?
        Hide
        Eric Yang added a comment -

        +1 on Jerome's approach. It may be overly complex having a default package name.

        Show
        Eric Yang added a comment - +1 on Jerome's approach. It may be overly complex having a default package name.
        Hide
        Bill Graham added a comment -

        Eric, I'm confused by your comment about a default package name and want to make sure we're all on the same page. Jerome and I are discussing the same approach and trying to decide on a config variable name. What are you referring to when you say "default package naming"? Just want to make sure we're all on the same page before I implement.

        Show
        Bill Graham added a comment - Eric, I'm confused by your comment about a default package name and want to make sure we're all on the same page. Jerome and I are discussing the same approach and trying to decide on a config variable name. What are you referring to when you say "default package naming"? Just want to make sure we're all on the same page before I implement.
        Hide
        Eric Yang added a comment -

        My apology, I thought you were referring to the package name for "chukwa.demux.mapper.default.processor". Your naming convention is more concise for the config key name.

        Show
        Eric Yang added a comment - My apology, I thought you were referring to the package name for "chukwa.demux.mapper.default.processor". Your naming convention is more concise for the config key name.
        Hide
        Bill Graham added a comment -

        Jerome and I touched base offline and he suggested the following configs, which work for me. I'll go with these unless there's a better suggestion:

        chukwa.demux.mapper.defaultMapProcessor
        chukwa.demux.reducer.defaultReduceProcessor
        
        Show
        Bill Graham added a comment - Jerome and I touched base offline and he suggested the following configs, which work for me. I'll go with these unless there's a better suggestion: chukwa.demux.mapper.defaultMapProcessor chukwa.demux.reducer.defaultReduceProcessor
        Hide
        Eric Yang added a comment -

        Without camel case is more user friendly in my opinion, but Chukwa configurations are currently mixed of dot notation with camel case. Hence, my opinion doesn't matter.

        Show
        Eric Yang added a comment - Without camel case is more user friendly in my opinion, but Chukwa configurations are currently mixed of dot notation with camel case. Hence, my opinion doesn't matter.
        Hide
        Bill Graham added a comment -

        +1 for no camelcase actually. I think something like this reads better:

        chukwa.demux.mapper.default.processor
        chukwa.demux.reducer.default.processor
        
        Show
        Bill Graham added a comment - +1 for no camelcase actually. I think something like this reads better: chukwa.demux.mapper.default.processor chukwa.demux.reducer.default.processor
        Hide
        Eric Yang added a comment -

        +1

        chukwa.demux.mapper.default.processor
        chukwa.demux.reducer.default.processor
        
        Show
        Eric Yang added a comment - +1 chukwa.demux.mapper.default.processor chukwa.demux.reducer.default.processor
        Hide
        Bill Graham added a comment -

        I've got a patch ready that makes the default map processor configurable, but just wanted to clarify how we want to do the same for the reducer processor. It looks for a class based on key.getReduceType() and falls back on IdentityReducer in ReduceProcessorFactory. Assuming that's what we want to be able to set a default for.

        If so I need to change the signature of ReduceProcessorFactory.getProcessor(String reduceType) to either take a JobConf object or a default processor string, since I don't think that method should return null and delegate the default logic to the caller. I'm leaning slightly towards the later. Any opinions either way?

        Show
        Bill Graham added a comment - I've got a patch ready that makes the default map processor configurable, but just wanted to clarify how we want to do the same for the reducer processor. It looks for a class based on key.getReduceType() and falls back on IdentityReducer in ReduceProcessorFactory. Assuming that's what we want to be able to set a default for. If so I need to change the signature of ReduceProcessorFactory.getProcessor(String reduceType) to either take a JobConf object or a default processor string, since I don't think that method should return null and delegate the default logic to the caller. I'm leaning slightly towards the later. Any opinions either way?
        Hide
        Bill Graham added a comment -

        Attaching CHUKWA-473.1.patch, which implements default map and reduce processors. For the later the behavior is to look for a reducer class with name of reducerType, followed by the default setting, and finally the IdentityReducer.

        Reduce processors are still bound by the constraint that they have to live in org.apache.hadoop.chukwa.extraction.demux.processor.reducer. If that's to be fixed, we should do so in a separate bug.

        Show
        Bill Graham added a comment - Attaching CHUKWA-473 .1.patch, which implements default map and reduce processors. For the later the behavior is to look for a reducer class with name of reducerType, followed by the default setting, and finally the IdentityReducer. Reduce processors are still bound by the constraint that they have to live in org.apache.hadoop.chukwa.extraction.demux.processor.reducer. If that's to be fixed, we should do so in a separate bug.
        Hide
        Jerome Boulon added a comment -

        Since we are changing the reduce part, can we add aliases on the reducer side similar to what we have on the mapper side?

        Show
        Jerome Boulon added a comment - Since we are changing the reduce part, can we add aliases on the reducer side similar to what we have on the mapper side?
        Hide
        Bill Graham added a comment -

        There's plenty of room for refactoring the reducer code. I think we should limit the scope of this JIRA to making the default processors configurable though, and open another one for supporting aliases.

        Show
        Bill Graham added a comment - There's plenty of room for refactoring the reducer code. I think we should limit the scope of this JIRA to making the default processors configurable though, and open another one for supporting aliases.
        Hide
        Bill Graham added a comment -

        Attaching a second patch with cleaned up unit tests.

        Show
        Bill Graham added a comment - Attaching a second patch with cleaned up unit tests.
        Hide
        Ari Rabkin added a comment -

        Unit tests fail for me. -1 to commit just yet.

        Show
        Ari Rabkin added a comment - Unit tests fail for me. -1 to commit just yet.
        Hide
        Bill Graham added a comment -

        My bad, I missed a file in that last patch. Here's CHUKWA-473.3.patch.

        Show
        Bill Graham added a comment - My bad, I missed a file in that last patch. Here's CHUKWA-473 .3.patch.
        Hide
        Ari Rabkin added a comment -

        I may have accidentally committed the previous version of this. Can you regenerate patch from current trunk if something isn't working?
        Trunk unit tests pass, so I'm not quite sure what happened...

        Show
        Ari Rabkin added a comment - I may have accidentally committed the previous version of this. Can you regenerate patch from current trunk if something isn't working? Trunk unit tests pass, so I'm not quite sure what happened...
        Hide
        Bill Graham added a comment -

        Hmm. I'm not seeing any of these files committed. I saw commits for 479 and 480. Either way, here's an updated patch 4 built from the current trunk.

        Show
        Bill Graham added a comment - Hmm. I'm not seeing any of these files committed. I saw commits for 479 and 480. Either way, here's an updated patch 4 built from the current trunk.
        Hide
        Ari Rabkin added a comment -

        Just committed latest patch.

        Show
        Ari Rabkin added a comment - Just committed latest patch.
        Hide
        Ari Rabkin added a comment -

        [and thanks for putting up with random chaos from my end. ;)]

        Show
        Ari Rabkin added a comment - [and thanks for putting up with random chaos from my end. ;)]

          People

          • Assignee:
            Bill Graham
            Reporter:
            Bill Graham
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development