Pig
  1. Pig
  2. PIG-2692

Make the Pig unit faciliities more generalizable and update javadocs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Added the ability to mock multiple aliases when running a job.
      Added assertOutputAnyOrder to be order independent.
      Added user docs for mocking.
      Show
      Added the ability to mock multiple aliases when running a job. Added assertOutputAnyOrder to be order independent. Added user docs for mocking.

      Description

      This ticket has two goals for Pig unit:

      1) Pig unit has a really nice method assertOutput(String inputAlias, String[] inputValues, String outputAlias, String[] expectedOutputValues). That method lets you override an input alias variable with a hardcoded list of values. That way, the script doesn't actually have to read that input variable from hdfs or cassandra. Then, it runs the script and checks the specified output alias variable against the expected set of values. It's a really nice way to test your entire pig script with a single method call, but only IF your script has exactly 1 input and 1 output. If you want to test more complicated scripts, you have to jump through some hoops in order to override more input variables. But, it would be fairly easy to change PigUnit so that it can override any number of inputs and check any number of outputs and do so easily. That's basically the change that I put into the base testing class I wrote. But, it would be better to push that into PigUnit itself, and it's something that could easily be done in an afternoon.

      2) Update javadocs for the pig unit test classes to make them more readable.

      1. pig2692.patch
        17 kB
        Richard So

        Activity

        Hide
        Daniel Dai added a comment -

        That's great, add my penny thoughts:
        1. We also need methods to sort the result before compare
        2. Please give some examples in document (javadoc/wiki), about how to write tests in different scenario

        Show
        Daniel Dai added a comment - That's great, add my penny thoughts: 1. We also need methods to sort the result before compare 2. Please give some examples in document (javadoc/wiki), about how to write tests in different scenario
        Hide
        Jonathan Coveney added a comment -

        Anyone who adds documentation to Pig will get a round on me, should we ever be in the same bar

        Show
        Jonathan Coveney added a comment - Anyone who adds documentation to Pig will get a round on me, should we ever be in the same bar
        Hide
        Juan Gentile added a comment -

        +1 for being able to override multiple inputs. Regarding the ordering of results, I did my own assert:

        void assertUnsortedOutput(String[] expected, String alias) throws IOException, ParseException, AssertionError {
        List<String> expectedResults = new LinkedList<>(Arrays.asList(expected));
        Iterator<Tuple> resultsIterator = getAlias(alias);

        int size = 0;
        while (resultsIterator.hasNext())

        { String result = resultsIterator.next().toString(); assertTrue(expectedResults.contains(result)); expectedResults.remove(result); size++; }

        Assert.assertEquals(expected.length, size);
        }

        Show
        Juan Gentile added a comment - +1 for being able to override multiple inputs. Regarding the ordering of results, I did my own assert: void assertUnsortedOutput(String[] expected, String alias) throws IOException, ParseException, AssertionError { List<String> expectedResults = new LinkedList<>(Arrays.asList(expected)); Iterator<Tuple> resultsIterator = getAlias(alias); int size = 0; while (resultsIterator.hasNext()) { String result = resultsIterator.next().toString(); assertTrue(expectedResults.contains(result)); expectedResults.remove(result); size++; } Assert.assertEquals(expected.length, size); }
        Hide
        Juan Gentile added a comment -

        I kept extending PigTest... just in case someone needs this:

        public void overrideInput(String aliasInput, String[] input) throws IOException, ParseException

        { super.registerScript(); StringBuilder sb = new StringBuilder(); Schema.stringifySchema(sb, getPigServer().dumpSchema(aliasInput), DataType.TUPLE); final String destination = FileLocalizer.getTemporaryPath(getPigServer().getPigContext()).toString(); PigTest.getCluster().copyFromLocalFile(input, destination, true); override(aliasInput, String.format("%s = LOAD '%s' USING PigStorage('%s') AS %s;", aliasInput, destination, "\\t", sb.toString())); }
        Show
        Juan Gentile added a comment - I kept extending PigTest... just in case someone needs this: public void overrideInput(String aliasInput, String[] input) throws IOException, ParseException { super.registerScript(); StringBuilder sb = new StringBuilder(); Schema.stringifySchema(sb, getPigServer().dumpSchema(aliasInput), DataType.TUPLE); final String destination = FileLocalizer.getTemporaryPath(getPigServer().getPigContext()).toString(); PigTest.getCluster().copyFromLocalFile(input, destination, true); override(aliasInput, String.format("%s = LOAD '%s' USING PigStorage('%s') AS %s;", aliasInput, destination, "\\t", sb.toString())); }
        Hide
        Richard So added a comment -

        Patch to handle mocking and order independent assertOutput

        Show
        Richard So added a comment - Patch to handle mocking and order independent assertOutput
        Hide
        Richard So added a comment -

        I haven't submitted a patch before. I was having issues with the documentation stuff and I'm hoping to be able to revisit it and submit updated documentation. I built the patch using git. I was able to apply it to a fresh copy of pig trunk and the test-commit and the TestPigTest tests pass.

        This patch addresses

        • Mocking multiple aliases
        • Making assertOutput order independent (So if your results are A, B and you pass B, A they will still match)
        Show
        Richard So added a comment - I haven't submitted a patch before. I was having issues with the documentation stuff and I'm hoping to be able to revisit it and submit updated documentation. I built the patch using git. I was able to apply it to a fresh copy of pig trunk and the test-commit and the TestPigTest tests pass. This patch addresses Mocking multiple aliases Making assertOutput order independent (So if your results are A, B and you pass B, A they will still match)
        Hide
        Daniel Dai added a comment -

        Can you add some documentation to http://pig.apache.org/docs/r0.13.0/test.html#pigunit? The source code for documentation is in src/docs/src/documentation/content/xdocs/test.xml

        Show
        Daniel Dai added a comment - Can you add some documentation to http://pig.apache.org/docs/r0.13.0/test.html#pigunit? The source code for documentation is in src/docs/src/documentation/content/xdocs/test.xml
        Hide
        Richard So added a comment -

        I'm going to cancel this patch for now. After further evaluation I realize that always ignoring order on assertOutput could conflict with anyone trying to explicitly test their order clauses in their statement. Therefore, I'll leave assertOutput as is ensuring that order matters and I will add assertOutputAnyOrder to ignore order when comparing values. This way the previous versions of being order dependent are backwards compatible with this patch. I'll also work on the documentation and submit a patch with everything combined.

        Show
        Richard So added a comment - I'm going to cancel this patch for now. After further evaluation I realize that always ignoring order on assertOutput could conflict with anyone trying to explicitly test their order clauses in their statement. Therefore, I'll leave assertOutput as is ensuring that order matters and I will add assertOutputAnyOrder to ignore order when comparing values. This way the previous versions of being order dependent are backwards compatible with this patch. I'll also work on the documentation and submit a patch with everything combined.
        Hide
        Richard So added a comment -

        I updated my patch. As per my previous comment, I didn't want to break existing functionality. I added a couple assertOutputAnyOrder methods that can be used instead of assertOutput. If you have an ordered output where it matters use assertOutput. I managed to fix my issue with documentation and added a section pertaining to mocking in the user docs.

        Show
        Richard So added a comment - I updated my patch. As per my previous comment, I didn't want to break existing functionality. I added a couple assertOutputAnyOrder methods that can be used instead of assertOutput. If you have an ordered output where it matters use assertOutput. I managed to fix my issue with documentation and added a section pertaining to mocking in the user docs.
        Hide
        Richard So added a comment -

        Daniel Dai Is there anything special I need to do with the workflow of this issue to move it along? I added the documentation and updated the code in the currently attached patch.

        Show
        Richard So added a comment - Daniel Dai Is there anything special I need to do with the workflow of this issue to move it along? I added the documentation and updated the code in the currently attached patch.
        Hide
        Daniel Dai added a comment -

        Patch looks good. Sorry for missing this for a long time.

        Patch committed to trunk. Thanks Richard, that will be useful!

        Show
        Daniel Dai added a comment - Patch looks good. Sorry for missing this for a long time. Patch committed to trunk. Thanks Richard, that will be useful!

          People

          • Assignee:
            Richard So
            Reporter:
            Jeremy Hanna
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development