Hadoop Common
  1. Hadoop Common
  2. HADOOP-6767

Patch for running Hadoop on Windows without Cygwin

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: build, conf, scripts
    • Environment:

      Windows XP, 2003, 7, 2008

    • Release Note:
      Batch scripts for running Hadoop on windows, scripts for setting Hadoop as windows service using Apache Commons Daemon, and fixes in build
    • Tags:
      windows cygwin patch

      Description

      Proposed patch from Codeminders adds a possibility to run Hadoop on Windows without Cygwin.

      1. hadoop-6767-r1.tar.gz
        12 kB
        Volodymyr Orlov
      2. HADOOP-6767.patch
        301 kB
        Volodymyr Orlov
      3. Hadoop-0.20.2-patched.zip
        3.98 MB
        Volodymyr Orlov

        Activity

        Hide
        Volodymyr Orlov added a comment -

        Patch for Hadoop-0.20.2

        Show
        Volodymyr Orlov added a comment - Patch for Hadoop-0.20.2
        Hide
        Volodymyr Orlov added a comment -

        Added patched Hadoop 0.20.2 with configuration files, needed to run Hadoop as a Windows service

        Show
        Volodymyr Orlov added a comment - Added patched Hadoop 0.20.2 with configuration files, needed to run Hadoop as a Windows service
        Hide
        Steve Loughran added a comment -
        1. JNA is LGPL, ASF is -1 to anything with this license that you must have to work, so it will take some care to get this to co-exist. Having an ivy dependency that says JNA must be on the classpath is not going to work.
        2. I don't like the .conf files which seem to build up classpaths from hard coded paths, assume you are running from the build directories, and could not be used in production systems for this reason.The conf files are there because you've chosen one way to run Hadoop as service, but as there are other options, that's a different patch, independent of everything else.
        3. Moving from UnixUserGroupInformation to UserGroupInformation is going to break a lot of downstream code
        4. The usual space vs tab rules and other changes to javadocs complicate the patch.
        5. A patch to 0.20.x will only break everything that uses cygwin
        6. I expect for performance reasons not using the commons-compress libs on Unix makes sense, but that codepath should still be tested on unix, so that hudson catches problems.
        7. I don't see any explicit tests of the new stuff, so the only way this stuff would get tested now is if someone sets up Hudson on one or more windows versions and runs this as a service.

        what you have done is very impressive, but as it stands, I'm -1 to this. I may be more amenable if

        1. There is a strategy which the hadoop dev teams and the ASF are happy with for using JNA. Building against it when present but not redistributing it is one option, in which case Hadoop on windows has to fall back to cygwin where JNA isn't on the path
        2. changes are against SVN_TRUNK, no attempt to support existing releases.
        3. the wrapper stuff is separated completely off from the JNA stuff. How you run Hadoop as a service on windows is independent from making Hadoop work without cygwin; merging the two just makes it harder to use.
        4. TaskRunner.normalizeClassPath() needs a complete rewrite following apache coding best practices, some test cases that are designed to work cross platform (hint: split the windows and unix options, test both)
        5. Minor: the patch shouldn't make changes to javadocs, line endings that aren't needed.

        The big concern I have is the split between windows and everything else. I know you've eliminated cygwin, but you've added in requirements to understand the Win32 APIs, win32 vs win64 issues, and a separate windows codebase that will need to be tested on 32-bit and 64 bit versions of Windows Vista, Win7 and windows server; 6 different OS combinations we currently avoid.
        That scares me.

        Show
        Steve Loughran added a comment - JNA is LGPL, ASF is -1 to anything with this license that you must have to work, so it will take some care to get this to co-exist. Having an ivy dependency that says JNA must be on the classpath is not going to work. I don't like the .conf files which seem to build up classpaths from hard coded paths, assume you are running from the build directories, and could not be used in production systems for this reason.The conf files are there because you've chosen one way to run Hadoop as service, but as there are other options, that's a different patch, independent of everything else. Moving from UnixUserGroupInformation to UserGroupInformation is going to break a lot of downstream code The usual space vs tab rules and other changes to javadocs complicate the patch. A patch to 0.20.x will only break everything that uses cygwin I expect for performance reasons not using the commons-compress libs on Unix makes sense, but that codepath should still be tested on unix, so that hudson catches problems. I don't see any explicit tests of the new stuff, so the only way this stuff would get tested now is if someone sets up Hudson on one or more windows versions and runs this as a service. what you have done is very impressive, but as it stands, I'm -1 to this. I may be more amenable if There is a strategy which the hadoop dev teams and the ASF are happy with for using JNA. Building against it when present but not redistributing it is one option, in which case Hadoop on windows has to fall back to cygwin where JNA isn't on the path changes are against SVN_TRUNK, no attempt to support existing releases. the wrapper stuff is separated completely off from the JNA stuff. How you run Hadoop as a service on windows is independent from making Hadoop work without cygwin; merging the two just makes it harder to use. TaskRunner.normalizeClassPath() needs a complete rewrite following apache coding best practices, some test cases that are designed to work cross platform (hint: split the windows and unix options, test both) Minor: the patch shouldn't make changes to javadocs, line endings that aren't needed. The big concern I have is the split between windows and everything else. I know you've eliminated cygwin, but you've added in requirements to understand the Win32 APIs, win32 vs win64 issues, and a separate windows codebase that will need to be tested on 32-bit and 64 bit versions of Windows Vista, Win7 and windows server; 6 different OS combinations we currently avoid. That scares me.
        Hide
        Owen O'Malley added a comment -

        I agree with Steve. Apache doesn't allow LGPL. That said, I wish it would work for other platforms too...

        Show
        Owen O'Malley added a comment - I agree with Steve. Apache doesn't allow LGPL. That said, I wish it would work for other platforms too...
        Hide
        Steve Loughran added a comment -

        I actually think the ASF's interpretation of the LPGL is the issue here, there's been some coverage of the Hibernate Clause http://www.mail-archive.com/legal-discuss@apache.org/msg00009.html that could get things to co-exist -someone needs to talk to the JNA people about licensing here.

        If we could get the licenses compatible then we could drop all cygwin requirements and say "use JNA for windows". That would be a step towards making windows a supported platform, rather than something for people to play with Hadoop on before going into production. A step: you still need to test at real or virtual production scale with the target OS, which means some variant of Windows Server. I don't want to get involved in that, dealing with inconsistencies between Linuxes is painful enough. MS might be able to offer cluster time.

        First step: someone has to approach JNA and say "what do you mean by LGPL"?

        Show
        Steve Loughran added a comment - I actually think the ASF's interpretation of the LPGL is the issue here, there's been some coverage of the Hibernate Clause http://www.mail-archive.com/legal-discuss@apache.org/msg00009.html that could get things to co-exist -someone needs to talk to the JNA people about licensing here. If we could get the licenses compatible then we could drop all cygwin requirements and say "use JNA for windows". That would be a step towards making windows a supported platform, rather than something for people to play with Hadoop on before going into production. A step: you still need to test at real or virtual production scale with the target OS, which means some variant of Windows Server. I don't want to get involved in that, dealing with inconsistencies between Linuxes is painful enough. MS might be able to offer cluster time. First step: someone has to approach JNA and say "what do you mean by LGPL"?
        Hide
        Volodymyr Orlov added a comment -

        Thank you Steve for your fast criticism on the merits. I have made some investigations and I would like to suggest changes that will make the patch more appropriate:
        1) I will replace JNA with JNI. Thus, I'll make a code faster and there will be no need in getting round ASF + LGPL problem. Windows native code will be build only if releasing is done on Windows or with Wine. Hadoop will try to locate dll in the same way as it does it now for libhadoop.so and will fall back to Cygwin in case native library is not found
        2) I will use Apache Commons Daemon instead of Java Service Wrapper
        Also, all changes will be made against SVN_TRUNK, patch will not make changes to current javadocs and will use spaces instead of tab.
        Nevertheless I would like to leave service wrapper configuration files/code and code that enables Hadoop to run on Windows without Cygwin in the same patch because there is currently no way to start Hadoop daemons on Windows without Cygwin. In case I will separate it into two patches, Hadoop users will still need to apply both.
        Meanwhile I would like to add a description of a use case which prompted us to write the patch and a guide on how we are running Hadoop on Windows without Cygwin:
        http://vorlsblog.blogspot.com/2010/05/running-hadoop-on-windows-without.html

        Show
        Volodymyr Orlov added a comment - Thank you Steve for your fast criticism on the merits. I have made some investigations and I would like to suggest changes that will make the patch more appropriate: 1) I will replace JNA with JNI. Thus, I'll make a code faster and there will be no need in getting round ASF + LGPL problem. Windows native code will be build only if releasing is done on Windows or with Wine. Hadoop will try to locate dll in the same way as it does it now for libhadoop.so and will fall back to Cygwin in case native library is not found 2) I will use Apache Commons Daemon instead of Java Service Wrapper Also, all changes will be made against SVN_TRUNK, patch will not make changes to current javadocs and will use spaces instead of tab. Nevertheless I would like to leave service wrapper configuration files/code and code that enables Hadoop to run on Windows without Cygwin in the same patch because there is currently no way to start Hadoop daemons on Windows without Cygwin. In case I will separate it into two patches, Hadoop users will still need to apply both. Meanwhile I would like to add a description of a use case which prompted us to write the patch and a guide on how we are running Hadoop on Windows without Cygwin: http://vorlsblog.blogspot.com/2010/05/running-hadoop-on-windows-without.html
        Hide
        Volodymyr Orlov added a comment -

        Hi,

        After a pause I am continuing my work on porting Hadoop for Windows. In the attached archive I've added batch scripts for running Hadoop on Windows, made small changes to build.xml file to be able to build on Windows and added batch files and Apache Commons Daemon configuration files to run Hadoop as Windows services. All changes are against trunk

        Best Regards,
        Vladimir

        Show
        Volodymyr Orlov added a comment - Hi, After a pause I am continuing my work on porting Hadoop for Windows. In the attached archive I've added batch scripts for running Hadoop on Windows, made small changes to build.xml file to be able to build on Windows and added batch files and Apache Commons Daemon configuration files to run Hadoop as Windows services. All changes are against trunk Best Regards, Vladimir
        Hide
        Volodymyr Orlov added a comment -

        please see README.txt file in the archive I've attached to the issue

        Show
        Volodymyr Orlov added a comment - please see README.txt file in the archive I've attached to the issue
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12456682/hadoop-6767-r1.tar.gz
        against trunk revision 1031422.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/33//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12456682/hadoop-6767-r1.tar.gz against trunk revision 1031422. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/33//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12456682/hadoop-6767-r1.tar.gz
        against trunk revision 1071364.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/271//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12456682/hadoop-6767-r1.tar.gz against trunk revision 1071364. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/271//console This message is automatically generated.
        Hide
        Kevin Wang added a comment -

        Hi Guys,

        I am pretty new to this site. I am wondering when this patch can be accepted?

        Thanks,

        Kevin

        Show
        Kevin Wang added a comment - Hi Guys, I am pretty new to this site. I am wondering when this patch can be accepted? Thanks, Kevin
        Hide
        Kevin Wang added a comment -

        Hi Volodymyr,

        I am wondering whether you have plan to migrate the change to JNI and Apache Daemon?

        Thanks,

        Kevin

        Show
        Kevin Wang added a comment - Hi Volodymyr, I am wondering whether you have plan to migrate the change to JNI and Apache Daemon? Thanks, Kevin
        Hide
        Volodymyr Orlov added a comment -

        Hi Kevin,

        Despite the fact that I haven't done anything regarding this patch I am still planning to continue my work on it. The problem that I have faced with is that I won't have an opportunity to test my patch in a real deployment scenario as soon as I will finish it. In company, where I am wokring now we are not deploying Hadoop on Windows machines any more. That's why I can't see any real application for my possible contribution in this area.
        If you are interested in Hadoop on Windows, and you have Windows machines and intention to deploy your Hadoop cluster on them, we could combine our efforts and finish this patch. Here is my email address - vorl@codeminders.com. Please feel free to contact me if you are interested in my proposal.

        Vladimir

        Show
        Volodymyr Orlov added a comment - Hi Kevin, Despite the fact that I haven't done anything regarding this patch I am still planning to continue my work on it. The problem that I have faced with is that I won't have an opportunity to test my patch in a real deployment scenario as soon as I will finish it. In company, where I am wokring now we are not deploying Hadoop on Windows machines any more. That's why I can't see any real application for my possible contribution in this area. If you are interested in Hadoop on Windows, and you have Windows machines and intention to deploy your Hadoop cluster on them, we could combine our efforts and finish this patch. Here is my email address - vorl@codeminders.com. Please feel free to contact me if you are interested in my proposal. Vladimir
        Hide
        Robert Joseph Evans added a comment -

        Canceling the patch as it is over 8 months old and no longer applies to trunk

        Show
        Robert Joseph Evans added a comment - Canceling the patch as it is over 8 months old and no longer applies to trunk

          People

          • Assignee:
            Unassigned
            Reporter:
            Volodymyr Orlov
          • Votes:
            8 Vote for this issue
            Watchers:
            22 Start watching this issue

            Dates

            • Created:
              Updated:

              Development