Hadoop Common
  1. Hadoop Common
  2. HADOOP-1290

Move Hadoop Abacus to hadoop.mapred.lib

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Owen and I discussed this issue and we both felt that it is appropriate to move Hadoop Abacus to the hadoop main framework.
      Any comments/thoughts/concerns/objections?

      1. patch_1290.txt
        81 kB
        Runping Qi

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          Why? I'd like to hear more of this discussion.

          Show
          Doug Cutting added a comment - Why? I'd like to hear more of this discussion.
          Hide
          Runping Qi added a comment -

          Mainly, I feel the Abacus package proved useful and fits into mapred.lib nicely. If moved to mapred.lib, it will be easier for other contrib module such as streaming to use it.

          Show
          Runping Qi added a comment - Mainly, I feel the Abacus package proved useful and fits into mapred.lib nicely. If moved to mapred.lib, it will be easier for other contrib module such as streaming to use it.
          Hide
          Doug Cutting added a comment -

          Looking at:

          http://lucene.apache.org/hadoop/api/org/apache/hadoop/abacus/package-summary.html

          I agree. These look to be of general utility. +1

          Should we put some of these in lib.aggregate, or all just in lib?

          Show
          Doug Cutting added a comment - Looking at: http://lucene.apache.org/hadoop/api/org/apache/hadoop/abacus/package-summary.html I agree. These look to be of general utility. +1 Should we put some of these in lib.aggregate, or all just in lib?
          Hide
          Runping Qi added a comment -

          I'd put them all in lib.abacus (currently they are all in contrib.abacus).

          Show
          Runping Qi added a comment - I'd put them all in lib.abacus (currently they are all in contrib.abacus).
          Hide
          Doug Cutting added a comment -

          I'd put them all in lib.abacus [ ... ]

          I'd encourage a more descriptive name, like 'aggregate'. The convention is that only projects use meaningless names; that all names within projects should attempt to be descriptive.

          Show
          Doug Cutting added a comment - I'd put them all in lib.abacus [ ... ] I'd encourage a more descriptive name, like 'aggregate'. The convention is that only projects use meaningless names; that all names within projects should attempt to be descriptive.
          Hide
          Runping Qi added a comment -

          Then, lib.aggregate is fine with me.

          Show
          Runping Qi added a comment - Then, lib.aggregate is fine with me.
          Hide
          Nigel Daley added a comment -

          If abacus is going into the main framework, I think we should require some unit tests for these classes.

          Show
          Nigel Daley added a comment - If abacus is going into the main framework, I think we should require some unit tests for these classes.
          Hide
          Runping Qi added a comment -

          Sure.

          Show
          Runping Qi added a comment - Sure.
          Hide
          Runping Qi added a comment -

          This patch implemented the proposed protocol.

          With this patch, the streaming user can specify a field separatot for the mapper's output and/or a field separator
          for the reducer's output. The default will be the tab char.

          The user can also specify how many fields in the output consitute the keys. The default is 1.
          The rest part of a line will be the value.

          A partitioner class, KeyFieldBasedPartitioner in mapred.lib, is also implemented.
          The user can specify the number of the fields in the map output keys
          will be used for partitioning.

          Also a urility class, FieldSelectionMapReduce in mapred.lib, is added. This class allows the
          user to create map/reduce jobs that manapulate text data like the Unix cut utility.
          The user can specify field separator (delimiter for cut) and specify which fields to select, and
          by which fields to partition/sort.

          Two unit tests are introduced.
          All the unit tests passed.

          Show
          Runping Qi added a comment - This patch implemented the proposed protocol. With this patch, the streaming user can specify a field separatot for the mapper's output and/or a field separator for the reducer's output. The default will be the tab char. The user can also specify how many fields in the output consitute the keys. The default is 1. The rest part of a line will be the value. A partitioner class, KeyFieldBasedPartitioner in mapred.lib, is also implemented. The user can specify the number of the fields in the map output keys will be used for partitioning. Also a urility class, FieldSelectionMapReduce in mapred.lib, is added. This class allows the user to create map/reduce jobs that manapulate text data like the Unix cut utility. The user can specify field separator (delimiter for cut) and specify which fields to select, and by which fields to partition/sort. Two unit tests are introduced. All the unit tests passed.
          Hide
          Runping Qi added a comment -

          Ooops, wrong JARA.

          Show
          Runping Qi added a comment - Ooops, wrong JARA.
          Hide
          Runping Qi added a comment -

          This patch adds abacus code to mapred.lib.aggregate package.

          It includes one unit test for the new code.

          After a release with this patch, the user should be guided to use this package instead of
          using contrib/abacus. Sometime down the road, contrib/abacus should be removed from future
          releases..

          Show
          Runping Qi added a comment - This patch adds abacus code to mapred.lib.aggregate package. It includes one unit test for the new code. After a release with this patch, the user should be guided to use this package instead of using contrib/abacus. Sometime down the road, contrib/abacus should be removed from future releases..
          Hide
          Hadoop QA added a comment -

          -1, new javadoc warnings

          The javadoc tool appears to have generated warning messages when testing the latest attachment http://issues.apache.org/jira/secure/attachment/12356366/patch_1290.txt against trunk revision r532871.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/85/testReport/
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/85/console

          Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

          Show
          Hadoop QA added a comment - -1, new javadoc warnings The javadoc tool appears to have generated warning messages when testing the latest attachment http://issues.apache.org/jira/secure/attachment/12356366/patch_1290.txt against trunk revision r532871. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/85/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/85/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
          Hide
          Doug Cutting added a comment -

          After a release with this patch, the user should be guided to use this package instead of using contrib/abacus

          Shouldn't we then deprecate all the classes in contrib/abacus?

          Show
          Doug Cutting added a comment - After a release with this patch, the user should be guided to use this package instead of using contrib/abacus Shouldn't we then deprecate all the classes in contrib/abacus?
          Hide
          Runping Qi added a comment -

          Sure. I was just lazy to deprecate those classes one by one

          Show
          Runping Qi added a comment - Sure. I was just lazy to deprecate those classes one by one
          Hide
          Runping Qi added a comment -

          Deprecate the classes in contrib/abacus

          Fixed a few warning in javadoc

          Show
          Runping Qi added a comment - Deprecate the classes in contrib/abacus Fixed a few warning in javadoc
          Show
          Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12356376/patch_1290.txt applied and successfully tested against trunk revision r532878. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/86/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/86/console
          Hide
          Doug Cutting added a comment -

          I committed this, with a few modifications. First, i used 'svn cp' before patching, so that existing file history is retained. Second, I removed Abacus from the javadoc. The Abacus classes are still built and available for back-compatibility, but no longer duplicated in the javadoc tree. Finally, I removed references to "Abacus" from the package.html file. Thanks, Runping!

          Show
          Doug Cutting added a comment - I committed this, with a few modifications. First, i used 'svn cp' before patching, so that existing file history is retained. Second, I removed Abacus from the javadoc. The Abacus classes are still built and available for back-compatibility, but no longer duplicated in the javadoc tree. Finally, I removed references to "Abacus" from the package.html file. Thanks, Runping!
          Hide
          Runping Qi added a comment -


          Thanks Doug for the final touches!

          Show
          Runping Qi added a comment - Thanks Doug for the final touches!
          Hide
          Hadoop QA added a comment -
          Show
          Hadoop QA added a comment - Integrated in Hadoop-Nightly #72 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/72/ )

            People

            • Assignee:
              Unassigned
              Reporter:
              Runping Qi
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development