Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1211

Pig script runs half way after which it reports syntax error

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.6.0
    • 0.8.0
    • impl
    • None
    • Incompatible change, Reviewed
    • Hide
      -c (-cluster) was earlier documented as the option to provide cluster information - this was not being used in the Pig code though - with PIG-1211, "-c" is being reused as the option to check syntax of the pig script
      Show
      -c (-cluster) was earlier documented as the option to provide cluster information - this was not being used in the Pig code though - with PIG-1211 , "-c" is being reused as the option to check syntax of the pig script

    Description

      I have a Pig script which is structured in the following way

      register cp.jar
      
      dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5);
      
      filtered_dataset = filter dataset by (col1 == 1);
      
      proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
      
      rmf $output1;
      
      store proj_filtered_dataset into '$output1' using PigStorage();
      
      second_stream = foreach filtered_dataset  generate col2, col4, col5;
      
      group_second_stream = group second_stream by col4;
      
      output2 = foreach group_second_stream {
       a =  second_stream.col2
       b =   distinct second_stream.col5;
       c = order b by $0;
       generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
      }
      
      rmf  $output2;
      
      --syntax error here
      store output2 to '$output2' using PigStorage();
      
      

      I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error.

      The usage of HDFS option, "rmf" causes the first store to execute.

      The only option the I have is to run an explain before running his script

      grunt> explain -script myscript.pig -out explain.out

      or moving the rmf statements to the top of the script

      Here are some questions:

      a) Can we have an option to do something like "checkscript" instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error
      b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables

      Thanks
      Viraj

      Attachments

        1. PIG-1211.patch
          23 kB
          Pradeep Kamath

        Activity

          People

            Unassigned Unassigned
            viraj Viraj Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: