Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9008

Building hadoop tarball fails on Windows

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • trunk-win
    • trunk-win
    • None
    • None

    Description

      Trying to build Hadoop trunk tarball via mvn package -Pdist -DskipTests -Dtar fails on Windows.

      Build system generates sh scripts that execute build tasks what does not work on Windows without Cygwin. It might make sense to apply the same pattern as in HADOOP-8924, and use python instead of sh.

      Attachments

        1. HADOOP-9008-branch-trunk-win.patch
          11 kB
          Chris Nauroth
        2. HADOOP-9008-branch-trunk-win.patch
          11 kB
          Chris Nauroth
        3. HADOOP-9008-branch-trunk-win.patch
          19 kB
          Chris Nauroth
        4. HADOOP-9008-branch-trunk-win.patch
          11 kB
          Chris Nauroth
        5. HADOOP-9008-branch-trunk-win.patch
          11 kB
          Chris Nauroth

        Issue Links

          Activity

            bikassaha Bikas Saha added a comment -

            or maven plugins like the ones used to compile protobufs without sh scripts.

            bikassaha Bikas Saha added a comment - or maven plugins like the ones used to compile protobufs without sh scripts.
            cnauroth Chris Nauroth added a comment -

            It looks like this is going to have to be Python. Trying to do this purely with Maven plugins, I see that it drops the symlinks that were created as part of the Linux native builds. It's also cumbersome to set file permissions correctly with the Maven plugin.

            cnauroth Chris Nauroth added a comment - It looks like this is going to have to be Python. Trying to do this purely with Maven plugins, I see that it drops the symlinks that were created as part of the Linux native builds. It's also cumbersome to set file permissions correctly with the Maven plugin.
            cnauroth Chris Nauroth added a comment -

            The attached patch ports the sh scripting in the distribution build to Python. It wasn't possible to use only Maven plugins (like maven-antrun-plugin with a <tar> task), because they mishandled permissions and symlinks in the built tarballs.

            I tested all of the following build variations:

            Windows: mvn -Pnative-win -Pdist -Dtar -DskipTests clean package
            Mac: mvn -Pdist -Dtar -DskipTests clean package
            Ubuntu: mvn -Pnative -Pdist -Dtar -DskipTests clean package
            Ubuntu: mvn -Pnative -Pdist -Dtar -Drequire.snappy -Dbundle.snappy -Dsnappy.lib=/usr/local/lib -DskipTests clean package

            This works on Windows. Additionally, on Mac and Ubuntu, I compared the built tarballs from before and after my changes. I confirmed that the resulting tarballs have exactly the same contents, including permissions and symlinks.

            cnauroth Chris Nauroth added a comment - The attached patch ports the sh scripting in the distribution build to Python. It wasn't possible to use only Maven plugins (like maven-antrun-plugin with a <tar> task), because they mishandled permissions and symlinks in the built tarballs. I tested all of the following build variations: Windows: mvn -Pnative-win -Pdist -Dtar -DskipTests clean package Mac: mvn -Pdist -Dtar -DskipTests clean package Ubuntu: mvn -Pnative -Pdist -Dtar -DskipTests clean package Ubuntu: mvn -Pnative -Pdist -Dtar -Drequire.snappy -Dbundle.snappy -Dsnappy.lib=/usr/local/lib -DskipTests clean package This works on Windows. Additionally, on Mac and Ubuntu, I compared the built tarballs from before and after my changes. I confirmed that the resulting tarballs have exactly the same contents, including permissions and symlinks.
            cnauroth Chris Nauroth added a comment -

            I made one additional change to guarantee that winutils goes into the bin directory for the common and distribution tarballs.

            cnauroth Chris Nauroth added a comment - I made one additional change to guarantee that winutils goes into the bin directory for the common and distribution tarballs.
            raja@cmbasics.com Raja Aluri added a comment -

            +1 for the change. A minor nit, if you may choose to change,
            I was wondering if the following code can eliminate some of the 'else' loops.

              def filter_func(tar_info):
                if tar_info.name == root:
                  return tar_info
                elif tar_info.isfile() or tar_info.issym():
                  if file_name_filter(basename(tar_info.name)):
                    return tar_info
                  else:
                    return None
                else:
                  return None
              return filter_func
            

            I just thought the code will be more concise

              def filter_func(tar_info):
                if tar_info.name == root:
                  return tar_info
                if tar_info.isfile() or tar_info.issym():
                  if file_name_filter(basename(tar_info.name)):
                    return tar_info
                 return None
              return filter_func
            
            raja@cmbasics.com Raja Aluri added a comment - +1 for the change. A minor nit, if you may choose to change, I was wondering if the following code can eliminate some of the 'else' loops. def filter_func(tar_info): if tar_info.name == root: return tar_info elif tar_info.isfile() or tar_info.issym(): if file_name_filter(basename(tar_info.name)): return tar_info else : return None else : return None return filter_func I just thought the code will be more concise def filter_func(tar_info): if tar_info.name == root: return tar_info if tar_info.isfile() or tar_info.issym(): if file_name_filter(basename(tar_info.name)): return tar_info return None return filter_func
            cnauroth Chris Nauroth added a comment -

            Thank you, Raja. I think you're right. I have attached a new patch that simplifies the make_file_filter function and adds some comments to help prevent confusion around the Python tarfile.add API.

            Also, I realized that my earlier patch did not include all of the files that I changed. Can you please take another look? There are smaller changes in a few other pom.xml files.

            cnauroth Chris Nauroth added a comment - Thank you, Raja. I think you're right. I have attached a new patch that simplifies the make_file_filter function and adds some comments to help prevent confusion around the Python tarfile.add API. Also, I realized that my earlier patch did not include all of the files that I changed. Can you please take another look? There are smaller changes in a few other pom.xml files.
            raja@cmbasics.com Raja Aluri added a comment -

            Chris.
            It is just for code readability sake. I will leave it to you.
            Sorry for not pointing this out earlier.
            Please consider using os.path.join() or os.path.abspath(os.path.join()) instead of normpath for cross platform sake.

            Also for building directory structure like the one below without platform specific '/'s

             arc_name = base_name + "/lib/native"
            

            can be

            os.path.join(base_name, 'lib' , 'native')
            
            >>> os.path.abspath(".")
            '/Users/raja/work/repos/'
            >>> os.path.abspath("../")
            '/Users/raja/work'
            >>> os.path.normpath("../../")
            '../..'
            >>> os.path.normpath("..")
            '..'
            >>> os.path.normpath("..")
            
            
            
            
            raja@cmbasics.com Raja Aluri added a comment - Chris. It is just for code readability sake. I will leave it to you. Sorry for not pointing this out earlier. Please consider using os.path.join() or os.path.abspath(os.path.join()) instead of normpath for cross platform sake. Also for building directory structure like the one below without platform specific '/'s arc_name = base_name + "/lib/ native " can be os.path.join(base_name, 'lib' , ' native ' ) >>> os.path.abspath( "." ) '/Users/raja/work/repos/' >>> os.path.abspath( "../" ) '/Users/raja/work' >>> os.path.normpath( "../../" ) '../..' >>> os.path.normpath( ".." ) '..' >>> os.path.normpath( ".." )
            cnauroth Chris Nauroth added a comment -

            Sorry, uploading the patch one more time, back to including just hadoop-dist/pom.xml and hadoop-project-dist/pom.xml. I forgot that the other module-specific ones are tracked in separate jiras.

            cnauroth Chris Nauroth added a comment - Sorry, uploading the patch one more time, back to including just hadoop-dist/pom.xml and hadoop-project-dist/pom.xml. I forgot that the other module-specific ones are tracked in separate jiras.
            cnauroth Chris Nauroth added a comment -

            Raja, I am attaching an updated patch that incorporates your advice to use abspath and / where applicable to improve readability. I verified that this works cross-platform by retesting all build variations.

            cnauroth Chris Nauroth added a comment - Raja, I am attaching an updated patch that incorporates your advice to use abspath and / where applicable to improve readability. I verified that this works cross-platform by retesting all build variations.
            raja@cmbasics.com Raja Aluri added a comment -

            +1. LGTM.
            Chris,
            Thanks for incorporating the feedback.

            raja@cmbasics.com Raja Aluri added a comment - +1. LGTM. Chris, Thanks for incorporating the feedback.

            +1. I committed the patch to branch-trunk-win.

            Thank you Chris. Thank you Raja for the review.

            sureshms Suresh Srinivas added a comment - +1. I committed the patch to branch-trunk-win. Thank you Chris. Thank you Raja for the review.

            People

              cnauroth Chris Nauroth
              ivanmi Ivan Mitic
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: