Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1644

Remove Sqoop from Apache Hadoop (moving to github)

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Sqoop is moving to github! All code for sqoop is already live at http://github.com/cloudera/sqoop - this issue removes the duplicate code from the Apache Hadoop repository before the 0.21 release.

      1. MAPREDUCE-1644.patch
        715 kB
        Aaron Kimball

        Activity

        Hide
        Aaron Kimball added a comment -

        Original message sent to common-user, mapreduce-user, hive-user, and general@hadoop.apache.org:

        Hi Hadoop, Hive, and Sqoop users,

        For the past year, the Apache Hadoop MapReduce project has played host to Sqoop, a command-line tool that performs parallel imports and exports between relational databases and HDFS. We've developed a lot of features and gotten a lot of great feedback from users. While Sqoop was a contrib project in Hadoop, it has been steadily improved and grown.

        But the contrib directory is a home for new or small projects incubating underneath Hadoop's umbrella. Sqoop is starting to look less like a small project these days. In particular, a feature that has been growing in importance for Sqoop is its ability to integrate with Hive. In order to facilitate this integration from a compilation and testing standpoint, we've pulled Sqoop out of contrib and into its own repository hosted on github.

        You can download all the relevant bits here: http://www.github.com/cloudera/sqoop

        The code there will run in conjunction with the Apache Hadoop trunk source. (Compatibility with other distributions/versions is forthcoming.)

        While we've changed hosts, Sqoop will keep the same license – future improvements will continue to remain Apache 2.0-licensed. We welcome the contributions of all in the open source community; there's a lot of exciting work still to be done! If you'd like to help out but aren't sure where to start, send me an email and I can recommend a few areas where improvements would be appreciated.

        Want some more information about Sqoop? An introduction is available here: http://www.cloudera.com/sqoop
        A ready-to-run release of Sqoop is included with Cloudera's Distribution for Hadoop: http://archive.cloudera.com
        And its reference manual is available for browsing at http://archive.cloudera.com/docs/sqoop

        If you have any questions about this move process, please ask me.

        Regards,

        • Aaron Kimball
          Cloudera, Inc.
        Show
        Aaron Kimball added a comment - Original message sent to common-user, mapreduce-user, hive-user, and general@hadoop.apache.org: Hi Hadoop, Hive, and Sqoop users, For the past year, the Apache Hadoop MapReduce project has played host to Sqoop, a command-line tool that performs parallel imports and exports between relational databases and HDFS. We've developed a lot of features and gotten a lot of great feedback from users. While Sqoop was a contrib project in Hadoop, it has been steadily improved and grown. But the contrib directory is a home for new or small projects incubating underneath Hadoop's umbrella. Sqoop is starting to look less like a small project these days. In particular, a feature that has been growing in importance for Sqoop is its ability to integrate with Hive. In order to facilitate this integration from a compilation and testing standpoint, we've pulled Sqoop out of contrib and into its own repository hosted on github. You can download all the relevant bits here: http://www.github.com/cloudera/sqoop The code there will run in conjunction with the Apache Hadoop trunk source. (Compatibility with other distributions/versions is forthcoming.) While we've changed hosts, Sqoop will keep the same license – future improvements will continue to remain Apache 2.0-licensed. We welcome the contributions of all in the open source community; there's a lot of exciting work still to be done! If you'd like to help out but aren't sure where to start, send me an email and I can recommend a few areas where improvements would be appreciated. Want some more information about Sqoop? An introduction is available here: http://www.cloudera.com/sqoop A ready-to-run release of Sqoop is included with Cloudera's Distribution for Hadoop: http://archive.cloudera.com And its reference manual is available for browsing at http://archive.cloudera.com/docs/sqoop If you have any questions about this move process, please ask me. Regards, Aaron Kimball Cloudera, Inc.
        Hide
        Aaron Kimball added a comment -

        Attaching a patch which removes Sqoop from the build.

        Show
        Aaron Kimball added a comment - Attaching a patch which removes Sqoop from the build.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440129/MAPREDUCE-1644.patch
        against trunk revision 928104.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 108 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440129/MAPREDUCE-1644.patch against trunk revision 928104. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 108 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/69/console This message is automatically generated.
        Hide
        Allen Wittenauer added a comment -

        But the contrib directory is a home for new or small projects incubating underneath Hadoop's umbrella. Sqoop is starting to look less like a small project these days. In particular, a feature that has been growing in importance for Sqoop is its ability to integrate with Hive. In order to facilitate this integration from a compilation and testing standpoint, we've pulled Sqoop out of contrib and into its own repository hosted on github.

        Shouldn't it be a sub project or moved to the incubator to be a new TLP then? Pulling it out of Apache completely makes it seem like a different goal is really intended.

        Show
        Allen Wittenauer added a comment - But the contrib directory is a home for new or small projects incubating underneath Hadoop's umbrella. Sqoop is starting to look less like a small project these days. In particular, a feature that has been growing in importance for Sqoop is its ability to integrate with Hive. In order to facilitate this integration from a compilation and testing standpoint, we've pulled Sqoop out of contrib and into its own repository hosted on github. Shouldn't it be a sub project or moved to the incubator to be a new TLP then? Pulling it out of Apache completely makes it seem like a different goal is really intended.
        Hide
        Aaron Kimball added a comment -

        Allen,

        The Incubator is most appropriate for projects which are building strong development communities in an attempt to grow to TLP status. Sqoop has been under active development for a year and while it is gaining a user base, a diverse developer base does not seem to be materializing of its own accord. Thus I don't think that going through the process hurdles of incubation are necessarily appropriate at this time.

        My goal is to perform the necessary technical transitions, then get Sqoop toward a state where it can produce releases rapidly and regularly, independent of the heavier-weight Hadoop release process.

        Within this framework I intend to improve Sqoop's integration with other systems (namely Hive) which could not be done within the context of the MapReduce project itself.

        According to http://incubator.apache.org/incubation/Incubation_Policy.html the requirements for graduating from incubator to TLP status include:

        The project is not highly dependent on any single contributor (there are at least 3 legally independent committers and there is no single company or entity that is vital to the success of the project)

        I don't think it's appropriate to enter the incubator at this time, when I don't have a foreseeable plan to overcome this hurdle. Hadoop is also moving away from subprojects, so it doesn't seem reasonable to add yet another one there.

        Nothing prevents Sqoop from being brought into the Incubator at a later time, if a diverse set of developers emerge to improve Sqoop. But an incubator project that has a single committer seems overwrought.

        Show
        Aaron Kimball added a comment - Allen, The Incubator is most appropriate for projects which are building strong development communities in an attempt to grow to TLP status. Sqoop has been under active development for a year and while it is gaining a user base, a diverse developer base does not seem to be materializing of its own accord. Thus I don't think that going through the process hurdles of incubation are necessarily appropriate at this time. My goal is to perform the necessary technical transitions, then get Sqoop toward a state where it can produce releases rapidly and regularly, independent of the heavier-weight Hadoop release process. Within this framework I intend to improve Sqoop's integration with other systems (namely Hive) which could not be done within the context of the MapReduce project itself. According to http://incubator.apache.org/incubation/Incubation_Policy.html the requirements for graduating from incubator to TLP status include: The project is not highly dependent on any single contributor (there are at least 3 legally independent committers and there is no single company or entity that is vital to the success of the project) I don't think it's appropriate to enter the incubator at this time, when I don't have a foreseeable plan to overcome this hurdle. Hadoop is also moving away from subprojects, so it doesn't seem reasonable to add yet another one there. Nothing prevents Sqoop from being brought into the Incubator at a later time, if a diverse set of developers emerge to improve Sqoop. But an incubator project that has a single committer seems overwrought.
        Hide
        Allen Wittenauer added a comment -

        My goal is to perform the necessary technical transitions, then get Sqoop toward a state where it can produce releases rapidly and regularly, independent of the heavier-weight Hadoop release process.

        I think this says it all.

        Let the fragmentation begin.

        Show
        Allen Wittenauer added a comment - My goal is to perform the necessary technical transitions, then get Sqoop toward a state where it can produce releases rapidly and regularly, independent of the heavier-weight Hadoop release process. I think this says it all. Let the fragmentation begin.
        Hide
        Chris Douglas added a comment -

        +1

        I committed this. Thanks Aaron!

        Good luck with your project.

        Show
        Chris Douglas added a comment - +1 I committed this. Thanks Aaron! Good luck with your project.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #280 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/280/)
        MAPREDUCE-1644. Remove Sqoop contrib module. Contributed by Aaron Kimball

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #280 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/280/ ) MAPREDUCE-1644 . Remove Sqoop contrib module. Contributed by Aaron Kimball
        Hide
        Tom White added a comment -

        Since Sqoop has never been in a Hadoop release, it should not have any entries in CHANGES.txt. I'll remove the relevant entries from trunk and the 0.21 branch.

        Show
        Tom White added a comment - Since Sqoop has never been in a Hadoop release, it should not have any entries in CHANGES.txt. I'll remove the relevant entries from trunk and the 0.21 branch.

          People

          • Assignee:
            Aaron Kimball
            Reporter:
            Aaron Kimball
          • Votes:
            1 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development