Pig
  1. Pig
  2. PIG-2793

Make Pig Work on Windows without Cygwin

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Windows without Cygwin as a whole, but with some key binaries such as perl, diff, gawk, gzip, sed.

    • Hadoop Flags:
      Reviewed

      Description

      For pig to really work well on Windows, it needs hadoop core changes. Right now, those are in progress in branch-1-win. For this work, I am running Pig on Windows against branch-1-win and removing Cygwin dependencies as capabilities open up. Branch-1-win is fairly stable now, and has opened up enough functionality to see the few things needed in Pig to run E2E on top of a cross-platform Hadoop core without Cygwin. This uber-JIRA should track the whole of the work to get pig running well on Windows without Cygwin.

      There are a few types of work that I think are needed right now (will break-out sub-jiras to track them):

      TEST:
      --------
      1.) Tests that generate pig script strings with paths in them (e.g. dynamically build load/store commands) need to have Pig escape ("\") characters encoded – as they can now occur in both Hadoop and local paths.

      2.) Tests that generate local temporary files with createTempFile, and then try to use those as HDFS paths need to remove ":" from the generated file name to create valid HDFS paths.

      3.) Tests that hand-generate URIs via string concatenation (e.g. "file:" + strFileName) need to use Util.generateURI instead to get a valid URI for the target platform.

      4.) Tests that assume the first line in a script (e.g. #!/bin/sh) auto-resolves interpreters need to explicitly call the interpreter (e.g. instead of calling "perlscript.pl" they should call "perl perlscript.pl".

      5.) Changes in quotes or command syntax between shells (e.g. " or ', dir or ls) need to be tuned a little here and there.

      PROD:
      --------

      1.) The streaming interface needs to be fixed to run without a Cygwin dependency.

      2.) The pig.additional.jars separator is currently hardcoded to ":", and should be File.pathSeparator instead (":" on linux, ";" on Windows) to be able to accept Windows paths (C:\file.jar for instance).

      3.) The Grunt "sh" command highly surfaces the behavior of the exec API. If you use a built-in, it fails with file not found. This surfaces a lot of differences in shell implementation differences (e.g. ls is an exe, but dir is builtin) – and many of the cases in TestGrunt end up running (sh bash -c "command"). For portability and ease of use, sh should actually exec "sh -c <command> on Linux and "cmd /C <command>" on Windows to improve usability and make it possible to use aliases and bat files on either platform to make the interface more platform independent to end-users.

      4.) (eventual) Update Pig's dependencies to pick up a stable Hadoop core that runs on Windows from a release branch.

        Issue Links

        1.
        Pig test: add utils to simplify testing on Windows Sub-task Closed John Gordon
         
        2.
        Fix test cases that generate pig scripts with "load " + pathStr to encode "\" in the path Sub-task Closed John Gordon
         
        3.
        Local temporary paths are not always valid HDFS path names. Sub-task Closed John Gordon
         
        4.
        Tests should not create their own file URIs through string concatenation, should use Util.generateURI instead Sub-task Closed John Gordon
         
        5.
        pig streaming tests assume interpreters are auto-resolved Sub-task Closed John Gordon
         
        6.
        Update pig streaming interface to run correctly on Windows without Cygwin Sub-task Resolved John Gordon
         
        7.
        pig.additional.jars path separator should align with File.pathSeparator instead of being hard-coded to ":" Sub-task Closed John Gordon
         
        8.
        grunt "sh" command should invoke the shell implicitly instead of calling exec directly with the command tokens Sub-task Closed John Gordon
         
        9.
        DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health Sub-task Closed John Gordon
         
        10.
        DevTests, TestLoad has a false failure on Windows Sub-task Closed John Gordon
         
        11.
        "which" utility does not exist on Windows Sub-task Closed Daniel Dai
         
        12.
        TestParamSubPreproc still depends on "bash" to run Sub-task Resolved Daniel Dai
         
        13.
        Fix bunch of Pig e2e tests on Windows Sub-task Closed Daniel Dai
         
        14.
        Invalid cache specification for some streaming statement Sub-task Closed Daniel Dai
         
        15.
        TetsScriptUDF fail due to volume prefix in jar Sub-task Resolved Daniel Dai
         
        16.
        Pig tests do not appear to have a logger attached Sub-task Closed Daniel Dai
         
        17.
        Add a pig.cmd for Pig to run under Windows Sub-task Closed Daniel Dai
         
        18.
        Increase the timeout for unit test Sub-task Closed Daniel Dai
         
        19.
        TestEmptyInputDir unit test - hadoop version detection logic is brittle Sub-task Resolved John Gordon
         
        20.
        TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification Sub-task Closed David Wannemacher
         
        21.
        Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences Sub-task Closed David Wannemacher
         
        22.
        pigTest unit test needs a newline filter for comparisons of golden multi-line Sub-task Closed John Gordon
         
        23.
        testGrunt dev test needs some command filters to run correctly without cygwin Sub-task Closed John Gordon
         
        24.
        TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution Sub-task Closed John Gordon
         
        25.
        TestBZip fail on Windows due to new line character Sub-task Resolved Daniel Dai
         
        26.
        TestMacroExpansion has some invocations of perl scripts without explicitly invoking perl as the interpreter Sub-task Resolved Daniel Dai
         
        27.
        TestRegisteredJarVisibility tries to copy a local file to HDFS with the full path name including ":" Sub-task Resolved Daniel Dai
         
        28.
        Native Windows Compatibility for Pig E2E Tests and Harness Sub-task Closed Tony Murphy
         
        29.
        TestExampleGenerator fails on Windows because of lack of file name escaping Sub-task Closed David Wannemacher
         
        30.
        Fix remaining Windows core unit test failures Sub-task Closed Daniel Dai
         
        31.
        Fix Windows piggybank unit test failures Sub-task Closed Daniel Dai
         
        32.
        Fix remaining Window e2e tests Sub-task Closed Daniel Dai
         

          Activity

            People

            • Assignee:
              Daniel Dai
              Reporter:
              John Gordon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development