Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-303

refactor the mapred package into small pieces

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The mapred package has gotten too big, so I propose changing it to split it into parts.

      I propose the following splits:

      org.apache.hadoop.mapred = client API
      org.apache.hadoop.mapred.task = code for task tracker
      org.apache.hadoop.mapred.job = code for job tracker
      org.apache.hadoop.mapred.utils = non public code that is shared between the servers

      Does anyone have any other divisions that would help?

      I would make the classes sent through RPC public classes in the server's package.

      Thoughts?

        Activity

        Hide
        Harsh J added a comment -

        Not a problem after we mavenized the trunk.

        Show
        Harsh J added a comment - Not a problem after we mavenized the trunk.
        Hide
        Hemanth Yamijala added a comment -

        Perhaps move the DistributedCache while you're at it?

        +1, +2 if allowed smile

        Show
        Hemanth Yamijala added a comment - Perhaps move the DistributedCache while you're at it? +1, +2 if allowed smile
        Hide
        Philip Zeyliger added a comment -

        +1

        Perhaps move the DistributedCache while you're at it?

        Show
        Philip Zeyliger added a comment - +1 Perhaps move the DistributedCache while you're at it?
        Hide
        Owen O'Malley added a comment - - edited

        We should actually do this in our lifetime. How about:

        mapreduce.server.

        {common,jobtracker,tasktracker}

        mapreduce.child.

        {common,map,reduce}

        mapreduce.client
        mapreduce.common

        Show
        Owen O'Malley added a comment - - edited We should actually do this in our lifetime. How about: mapreduce.server. {common,jobtracker,tasktracker} mapreduce.child. {common,map,reduce} mapreduce.client mapreduce.common
        Hide
        brian added a comment - - edited

        In addition to the splits by Owen above, I would also suggest the following package restructuring

        package org.apache.hadoop.mapred.sequence;

        SequenceFileAsBinaryInputFormat
        SequenceFileAsBinaryOutputFormat
        SequenceFileAsTextInputFormat
        SequenceFileAsTextRecordReader
        SequenceFileInputFilter
        SequenceFileInputFormat
        SequenceFileOutputFormat
        SequenceFileRecordReader

        package org.apache.hadoop.mapred.file;

        FileAlreadyExistsException
        FileInputFormat
        FileOutputCommitter
        FileOutputFormat
        FileSplit
        IFile
        IFileInputStream
        IFileOutputStream

        Show
        brian added a comment - - edited In addition to the splits by Owen above, I would also suggest the following package restructuring package org.apache.hadoop.mapred.sequence; SequenceFileAsBinaryInputFormat SequenceFileAsBinaryOutputFormat SequenceFileAsTextInputFormat SequenceFileAsTextRecordReader SequenceFileInputFilter SequenceFileInputFormat SequenceFileOutputFormat SequenceFileRecordReader package org.apache.hadoop.mapred.file; FileAlreadyExistsException FileInputFormat FileOutputCommitter FileOutputFormat FileSplit IFile IFileInputStream IFileOutputStream
        Hide
        Doug Cutting added a comment -

        +1

        I would name the "utils" package "common" or something instead, since "util" might imply it's for users.

        Should we also move the input & output format implementations to an "io" package, or to the "lib" package?

        Show
        Doug Cutting added a comment - +1 I would name the "utils" package "common" or something instead, since "util" might imply it's for users. Should we also move the input & output format implementations to an "io" package, or to the "lib" package?

          People

          • Assignee:
            Owen O'Malley
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development