Pig
  1. Pig
  2. PIG-3630

Macros that work in Pig 0.11 fail in Pig 0.12 :(

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.12.0
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None

      Description

      http://my.safaribooksonline.com/book/databases/9781449326890/7dot-exploring-data-with-reports/i_sect13_id196600_html

      The ntf-idf macro listed there works under 0.11. Under 0.12, it results in this:

      13/12/16 22:09:19 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
      2013-12-16 22:09:19,159 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0-SNAPSHOT (rUnversioned directory) compiled Dec 09 2013, 14:37:29
      2013-12-16 22:09:19,159 [main] INFO org.apache.pig.Main - Logging error messages to: /private/tmp/pig_1387260559120.log
      2013-12-16 22:09:19.268 java[38060:1903] Unable to load realm info from SCDynamicStore
      2013-12-16 22:09:19,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
      2013-12-16 22:09:20,189 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
      at expanding macro 'tf_idf' (per_business.pig:9)
      <file per_business.pig, line 35, column 17> Invalid field projection. Projected field [tf_idf] does not exist in schema: business_id:chararray,token:chararray,term_freq:double,num_docs_with_token:long.
      2013-12-16 22:09:20,189 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.PlanValidationException: ERROR 1025:
      at expanding macro 'tf_idf' (per_business.pig:9)
      <file per_business.pig, line 35, column 17> Invalid field projection. Projected field [tf_idf] does not exist in schema: business_id:chararray,token:chararray,term_freq:double,num_docs_with_token:long.
      at org.apache.pig.newplan.logical.expression.ProjectExpression.findColNum(ProjectExpression.java:191)
      at org.apache.pig.newplan.logical.expression.ProjectExpression.setColumnNumberFromAlias(ProjectExpression.java:174)
      at org.apache.pig.newplan.logical.visitor.ColumnAliasConversionVisitor$1.visit(ColumnAliasConversionVisitor.java:53)
      at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:215)
      at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
      at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
      at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:142)
      at org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:128)
      at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
      at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:124)
      at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76)
      at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
      at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
      at org.apache.pig.PigServer$Graph.compile(PigServer.java:1694)
      at org.apache.pig.PigServer$Graph.compile(PigServer.java:1686)
      at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1387)
      at org.apache.pig.PigServer.execute(PigServer.java:1302)
      at org.apache.pig.PigServer.executeBatch(PigServer.java:391)
      at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
      at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:195)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
      at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
      at org.apache.pig.Main.run(Main.java:600)
      at org.apache.pig.Main.main(Main.java:156)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

        Activity

        Hide
        Dmitriy V. Ryaboy added a comment -

        Could you link to the code directly, rather than the book? The Safari website is giving me interstitials and other unpleasant things.

        Have you investigated the schemas of relations referred to in the error message, and checked if your field references make sense?

        2013-12-16 22:09:20,189 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
        at expanding macro 'tf_idf' (per_business.pig:9)
        <file per_business.pig, line 35, column 17> Invalid field projection. Projected field [tf_idf] does not exist in schema: business_id:chararray,token:chararray,term_freq:double,num_docs_with_token:long.

        Show
        Dmitriy V. Ryaboy added a comment - Could you link to the code directly, rather than the book? The Safari website is giving me interstitials and other unpleasant things. Have you investigated the schemas of relations referred to in the error message, and checked if your field references make sense? 2013-12-16 22:09:20,189 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: at expanding macro 'tf_idf' (per_business.pig:9) <file per_business.pig, line 35, column 17> Invalid field projection. Projected field [tf_idf] does not exist in schema: business_id:chararray,token:chararray,term_freq:double,num_docs_with_token:long.
        Hide
        Russell Jurney added a comment -
        Show
        Russell Jurney added a comment - Thanks for taking a look, the macro is at https://github.com/rjurney/Agile_Data_Code/blob/master/ch07/pig/ntfidf.macro
        Hide
        Dmitriy V. Ryaboy added a comment -

        That macro does not refer to a field called tf_idf. Could you post a fully reproducible test case?

        Show
        Dmitriy V. Ryaboy added a comment - That macro does not refer to a field called tf_idf. Could you post a fully reproducible test case?
        Show
        Russell Jurney added a comment - That one fails too. But yes: http://hortonworks.com/blog/pig-macro-for-tf-idf-makes-topic-summarization-2-lines-of-pig/
        Hide
        Dmitriy V. Ryaboy added a comment -

        Is this a AvroStorage or data issue?

        grunt> import '/Users/dmitriy/tmp/tf_idf.macro';
        grunt> register build/ivy/lib/Pig/avro-1.7.4.jar
        grunt> register build/ivy/lib/Pig/json-simple-1.1.jar
        grunt> register contrib/piggybank/java/piggybank.jar
        grunt> define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
        grunt> emails = load '/Users/dmitriy/Downloads/enron.avro';
        grunt> describe emails
        Schema for emails unknown.

        (this is the same in both pig 0.11 and pig 0.12).

        Can you provide a simple reproducible use case that doesn't involve Avro, etc?

        Can you share what debugging you've done so far?

        Show
        Dmitriy V. Ryaboy added a comment - Is this a AvroStorage or data issue? grunt> import '/Users/dmitriy/tmp/tf_idf.macro'; grunt> register build/ivy/lib/Pig/avro-1.7.4.jar grunt> register build/ivy/lib/Pig/json-simple-1.1.jar grunt> register contrib/piggybank/java/piggybank.jar grunt> define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); grunt> emails = load '/Users/dmitriy/Downloads/enron.avro'; grunt> describe emails Schema for emails unknown. (this is the same in both pig 0.11 and pig 0.12). Can you provide a simple reproducible use case that doesn't involve Avro, etc? Can you share what debugging you've done so far?
        Hide
        Russell Jurney added a comment -

        You forgot to put 'USING AvroStorage(); at the end of your load.

        Show
        Russell Jurney added a comment - You forgot to put 'USING AvroStorage(); at the end of your load.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Sure enough.

        Once I add that, everything works in 0.12 and now I can't reproduce the bug you are reporting.

        My pig is
        [tw-mbp13-dryaboy-2 pig-0.12]$ ./bin/pig -version
        Apache Pig version 0.12.0-SNAPSHOT (r1526044)
        compiled Dec 18 2013, 12:15:04

        same with more recent:
        [tw-mbp13-dryaboy-2 pig-0.12]$ ./bin/pig -version
        Apache Pig version 0.12.1-SNAPSHOT (r1552124)
        compiled Dec 18 2013, 14:00:21

        Back to you to get a reproducible test case

        Show
        Dmitriy V. Ryaboy added a comment - Sure enough. Once I add that, everything works in 0.12 and now I can't reproduce the bug you are reporting. My pig is [tw-mbp13-dryaboy-2 pig-0.12] $ ./bin/pig -version Apache Pig version 0.12.0-SNAPSHOT (r1526044) compiled Dec 18 2013, 12:15:04 same with more recent: [tw-mbp13-dryaboy-2 pig-0.12] $ ./bin/pig -version Apache Pig version 0.12.1-SNAPSHOT (r1552124) compiled Dec 18 2013, 14:00:21 Back to you to get a reproducible test case
        Hide
        Russell Jurney added a comment -

        https://github.com/rjurney/PIG-3630

        Reproducible test case there. You will have to change the path to Piggybank.jar, $HOME doesn't work in interactive mode so I don't use it.

        Show
        Russell Jurney added a comment - https://github.com/rjurney/PIG-3630 Reproducible test case there. You will have to change the path to Piggybank.jar, $HOME doesn't work in interactive mode so I don't use it.
        Hide
        Dmitriy V. Ryaboy added a comment -

        That one fails in both 0.11 and 0.12.

        Do you have something that works in 11 but fails in 12?

        Show
        Dmitriy V. Ryaboy added a comment - That one fails in both 0.11 and 0.12. Do you have something that works in 11 but fails in 12?
        Hide
        Dmitriy V. Ryaboy added a comment -

        Actually that failed in 11 due to missing register statements. It does work in 11 if you work around the Avro stuff. Ok, now we have something to look at...

        Show
        Dmitriy V. Ryaboy added a comment - Actually that failed in 11 due to missing register statements. It does work in 11 if you work around the Avro stuff. Ok, now we have something to look at...
        Hide
        Dmitriy V. Ryaboy added a comment -

        Now that registers are in place, it works in 12 as well:

        Input(s):
        Successfully read records from: "/Users/dmitriy/Downloads/trimmed_reviews.avro"
        
        Output(s):
        Successfully stored records in: "file:///Users/dmitriy/src/pig-0.12/tmp/pig_12_ntf_idf_scores"
        
        Job DAG:
        job_local_0001	->	job_local_0003,job_local_0002,
        job_local_0003	->	job_local_0005,
        job_local_0002	->	job_local_0004,
        job_local_0004	->	job_local_0005,
        job_local_0005
        
        
        2013-12-18 15:22:02,012 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
        

        Back to you...

        Show
        Dmitriy V. Ryaboy added a comment - Now that registers are in place, it works in 12 as well: Input(s): Successfully read records from: "/Users/dmitriy/Downloads/trimmed_reviews.avro" Output(s): Successfully stored records in: "file: ///Users/dmitriy/src/pig-0.12/tmp/pig_12_ntf_idf_scores" Job DAG: job_local_0001 -> job_local_0003,job_local_0002, job_local_0003 -> job_local_0005, job_local_0002 -> job_local_0004, job_local_0004 -> job_local_0005, job_local_0005 2013-12-18 15:22:02,012 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! Back to you...
        Hide
        Russell Jurney added a comment -

        Damn it. That exact code fails for me on latest trunk.

        Show
        Russell Jurney added a comment - Damn it. That exact code fails for me on latest trunk.

          People

          • Assignee:
            Unassigned
            Reporter:
            Russell Jurney
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development