Uploaded image for project: 'Chukwa'
  1. Chukwa
  2. CHUKWA-647

Spread out intermediate data with the same ReduceType into different Reduce Tasks

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.4.0, 0.6.0
    • Fix Version/s: 0.6.0
    • Component/s: Data Processors
    • Labels:
      None

      Description

      We have found that if we partitioned the map output data according to ReduceType, we can see the data skew in some HiTune cases. Then one or two Reduce Tasks slow down the whole Demux job somehow, since those reduce tasks have to process more input-data.

        Activity

        Hide
        grace.huang Jie Huang added a comment -

        The current ChukwaRecordPartitioner dispatches the records to different Reduce Tasks based on ReduceType.

        return (key.getReduceType().hashCode() & Integer.MAX_VALUE)
        

        I wonder if it is possible to include the key or part of the key content into the ChukwaRecordPartitioner, so that we can spread out all those map output data into different Reduce Tasks even for the same Reduce Type.

        Show
        grace.huang Jie Huang added a comment - The current ChukwaRecordPartitioner dispatches the records to different Reduce Tasks based on ReduceType. return (key.getReduceType().hashCode() & Integer.MAX_VALUE) I wonder if it is possible to include the key or part of the key content into the ChukwaRecordPartitioner, so that we can spread out all those map output data into different Reduce Tasks even for the same Reduce Type.
        Hide
        grace.huang Jie Huang added a comment -

        Here attaches a simple workaround. If the key contains the specific mark, the partitioner will include the key as well. OR Another option is to include part of the key content.Any other idea?

        Show
        grace.huang Jie Huang added a comment - Here attaches a simple workaround. If the key contains the specific mark, the partitioner will include the key as well. OR Another option is to include part of the key content.Any other idea?
        Hide
        asrabkin Ari Rabkin added a comment -

        Looks good to me. Will commit to Trunk barring objections.

        (My sense is that we aren't going to be doing minor-version releases so it doens't make sense to apply to 0.4 or 0.5 branches.)

        Show
        asrabkin Ari Rabkin added a comment - Looks good to me. Will commit to Trunk barring objections. (My sense is that we aren't going to be doing minor-version releases so it doens't make sense to apply to 0.4 or 0.5 branches.)
        Hide
        asrabkin Ari Rabkin added a comment -

        I just committed this to Trunk. Thanks!

        NOTE: made some slight changes to patch to apply correctly to Trunk.

        Show
        asrabkin Ari Rabkin added a comment - I just committed this to Trunk. Thanks! NOTE: made some slight changes to patch to apply correctly to Trunk.
        Hide
        hudson Hudson added a comment -

        Integrated in Chukwa-trunk #453 (See https://builds.apache.org/job/Chukwa-trunk/453/)
        CHUKWA-647. Spread out intermediate data with the same ReduceType into different Reduce Tasks. Contributed by Jie Huang. (Revision 1362318)

        Result = FAILURE
        asrabkin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1362318
        Files :

        • /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/extraction/CHUKWA_CONSTANT.java
        • /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/extraction/demux/ChukwaRecordPartitioner.java
        Show
        hudson Hudson added a comment - Integrated in Chukwa-trunk #453 (See https://builds.apache.org/job/Chukwa-trunk/453/ ) CHUKWA-647 . Spread out intermediate data with the same ReduceType into different Reduce Tasks. Contributed by Jie Huang. (Revision 1362318) Result = FAILURE asrabkin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1362318 Files : /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/extraction/CHUKWA_CONSTANT.java /incubator/chukwa/trunk/src/main/java/org/apache/hadoop/chukwa/extraction/demux/ChukwaRecordPartitioner.java

          People

          • Assignee:
            asrabkin Ari Rabkin
            Reporter:
            grace.huang Jie Huang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development