Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1404

PigUnit - Pig script testing simplified.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.8.0
    • None
    • None

    Description

      The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily:

      • unit tested
      • regression tested
      • quickly prototyped

      No cluster set up is required.

      For example:

      TestCase

        @Test
        public void testTop3Queries() {
          String[] args = {
              "n=3",        
              };
          test = new PigTest("top_queries.pig", args);
      
          String[] input = {
              "yahoo\t10",
              "twitter\t7",
              "facebook\t10",
              "yahoo\t15",
              "facebook\t5",
              ....
          };
      
          String[] output = {
              "(yahoo,25L)",
              "(facebook,15L)",
              "(twitter,7L)",
          };
      
          test.assertOutput("data", input, "queries_limit", output);
        }
      

      top_queries.pig

      data =
          LOAD '$input'
          AS (query:CHARARRAY, count:INT);
           
          ... 
          
      queries_sum = 
          FOREACH queries_group 
          GENERATE 
              group AS query, 
              SUM(queries.count) AS count;
              
          ...
                  
      queries_limit = LIMIT queries_ordered $n;
      
      STORE queries_limit INTO '$output';
      

      They are 3 modes:

      • LOCAL (if "pigunit.exectype.local" properties is present)
      • MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR)
        • automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in the class path will be: ~/pigtest/conf)
        • pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present)

      For now, it would be nice to see how this idea could be integrated in Piggybank and if PigParser/PigServer could improve their interfaces in order to make PigUnit simple.

      Other components based on PigUnit could be built later:

      • standalone MiniCluster
      • notion of workspaces for each test
      • standalone utility that reads test configuration and generates a test report...

      It is a first prototype, open to suggestions and can definitely take advantage of feedbacks.

      How to test, in pig_trunk:

      Apply patch
      $pig_trunk ant compile-test
      $pig_trunk ant
      $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
      

      (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future between 'unit' and 'integration')

      Many examples are in:

      contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
      

      When used as a standalone, do not forget commons-lang-2.4.jar and the HADOOP_CONF_DIR to your cluster in your CLASSPATH.

      Attachments

        1. commons-lang-2.4.jar
          256 kB
          Romain Rigaux
        2. PIG-1404.patch
          36 kB
          Romain Rigaux
        3. PIG-1404-2.patch
          42 kB
          Alan Gates
        4. PIG-1404-3.patch
          43 kB
          Romain Rigaux
        5. PIG-1404-3-doc.patch
          9 kB
          Romain Rigaux
        6. PIG-1404-4.patch
          45 kB
          Romain Rigaux
        7. PIG-1404-4-doc.patch
          9 kB
          Romain Rigaux
        8. PIG-1404-5.patch
          49 kB
          Alan Gates

        Activity

          People

            romainr Romain Rigaux
            romainr Romain Rigaux
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: