Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Hi. I have a suggestion to improve PigUnit
      1. Add default functionality to feed several input to one script. I didn't find a way how to do it using exsiting API and had to extend it.
      2. Allow to use "native" loaders. There are plenty of bug when you start to run in prod your script with AvroStorage/any other complacated storage. You can catch many schema/types related bugs on unit-test level.
      3. The same for storage.

        Activity

        Hide
        Prashant Kommireddi added a comment -

        Thanks for opening this JIRA Sergey. Do you have a patch you would like to contribute?

        We made a few changes to PigUnit too that we can contribute as a sub-jira to this.

        Show
        Prashant Kommireddi added a comment - Thanks for opening this JIRA Sergey. Do you have a patch you would like to contribute? We made a few changes to PigUnit too that we can contribute as a sub-jira to this.
        Hide
        Sergey added a comment -

        I suppose that our approach won't fit current PigUnit ideoma.
        We've used Groovy and we've wrapped

        def pigServer = new PigServer(ExecType.LOCAL)
        

        with our PigScriptTest class which feeds script to PigServer

        
                    pigServer.registerScript(new FileInputStream(scriptFile.absolutePath), params, null)
        
                    for (ExecJob job : pigServer.executeBatch())
                    {
                        while (!job.hasCompleted())
                        {
                            TimeUnit.SECONDS.sleep(1)
                        }
        
                        if (job.status != ExecJob.JOB_STATUS.COMPLETED)
                        {
                            return PigExecutionResult.failed()
                        }
                    }
        

        It's more data-driven test, than unit-test. The major advantage is that we can use any Storage/Loader unitities in script and script can go to produnction without any modification.

        Typical Pig test looks this way:

        class FilterEnrichXvlrEventsTest
        {
        
            @Test(groups = ['integration'])
            public void test01()
            {
                def test =
                    pigScriptTest("filter_enrich_xvlr_events.pig", "test01")
                        .withInput("xvlr_data", [format: new FormatMetadata(inputFormatType: FormatType.CSV,
                                                                   outputFormatType:FormatType.SEQ,
                                                                   keyClass:        NullWritable.class,
                                                                   valueClass:      Text)]) //special converter from csv to SequenceFile. It's easier to manage test data stored as CSV than binary seq file. We Use Twitter SeqenceFile readers in this script
                        .withInput("lol", "lol.avro") //avro input for AvroStorage
                        .withOutput("out_lte") //several output STORE statements is script
                        .withOutput("out")
        
                def result = test.run()
        
                assertThat(result, is(completed()))
        
                assertThat(result, hasOutput("out").notContains("xxx"))
                assertThat(result, hasOutput("out").contains("yyy"))
        
            }
        }
        
        Show
        Sergey added a comment - I suppose that our approach won't fit current PigUnit ideoma. We've used Groovy and we've wrapped def pigServer = new PigServer(ExecType.LOCAL) with our PigScriptTest class which feeds script to PigServer pigServer.registerScript( new FileInputStream(scriptFile.absolutePath), params, null ) for (ExecJob job : pigServer.executeBatch()) { while (!job.hasCompleted()) { TimeUnit.SECONDS.sleep(1) } if (job.status != ExecJob.JOB_STATUS.COMPLETED) { return PigExecutionResult.failed() } } It's more data-driven test, than unit-test. The major advantage is that we can use any Storage/Loader unitities in script and script can go to produnction without any modification. Typical Pig test looks this way: class FilterEnrichXvlrEventsTest { @Test(groups = ['integration']) public void test01() { def test = pigScriptTest( "filter_enrich_xvlr_events.pig" , "test01" ) .withInput( "xvlr_data" , [format: new FormatMetadata(inputFormatType: FormatType.CSV, outputFormatType:FormatType.SEQ, keyClass: NullWritable.class, valueClass: Text)]) //special converter from csv to SequenceFile. It's easier to manage test data stored as CSV than binary seq file. We Use Twitter SeqenceFile readers in this script .withInput( "lol" , "lol.avro" ) //avro input for AvroStorage .withOutput( "out_lte" ) //several output STORE statements is script .withOutput( "out" ) def result = test.run() assertThat(result, is(completed())) assertThat(result, hasOutput( "out" ).notContains( "xxx" )) assertThat(result, hasOutput( "out" ).contains( "yyy" )) } }

          People

          • Assignee:
            Unassigned
            Reporter:
            Sergey
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development