Pig
  1. Pig
  2. PIG-2248

Pig parser does not detect when a macro name masks a UDF name

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None

      Description

      Pig accepts a macro like:

      define COUNT(in_relation, min_gpa) returns c {
                                 b = filter $in_relation by gpa >= $min_gpa;
                                 $c = foreach b generate age, name;
                             }
      

      This should produce a warning that it is masking a UDF.

      1. PIG-2248.patch.txt
        4 kB
        Johnny Zhang
      2. PIG-2248.patch.txt
        5 kB
        Johnny Zhang
      3. PIG-2248.patch.txt
        5 kB
        Johnny Zhang
      4. PIG-2248.patch.txt
        5 kB
        Johnny Zhang

        Issue Links

          Activity

          Hide
          Johnny Zhang added a comment -

          with this patch, the Pig will throw warning message when macro has the same name as builtin UDF (all classes under package org.apache.pig.builtin)

          I tested it with two macro defined in one Pig script, there is only one warning message for each macro. The message looks like

          [main] WARN  org.apache.pig.parser.PigMacro - macro name 'COUNT' masks a builtin UDF org.apache.pig.builtin.COUNT
          [main] WARN  org.apache.pig.parser.PigMacro - macro name 'ABS' masks a builtin UDF org.apache.pig.builtin.ABS
          
          Show
          Johnny Zhang added a comment - with this patch, the Pig will throw warning message when macro has the same name as builtin UDF (all classes under package org.apache.pig.builtin) I tested it with two macro defined in one Pig script, there is only one warning message for each macro. The message looks like [main] WARN org.apache.pig.parser.PigMacro - macro name 'COUNT' masks a builtin UDF org.apache.pig.builtin.COUNT [main] WARN org.apache.pig.parser.PigMacro - macro name 'ABS' masks a builtin UDF org.apache.pig.builtin.ABS
          Hide
          Prashant Kommireddi added a comment -

          Thanks Johnny Zhang. I took a quick look and have a couple comments:

          1. You can check for UDFs in the import list in addition to builtins.
            private static String BUILTINUDF_PACKAGE = "org.apache.pig.builtin";
            

          You could rather use PigContext.getPackageImportList() for the comprehensive list.

          1. Use commons logging instead of printing stacktrace
                    } catch (ClassNotFoundException e) {
                        e.printStackTrace();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
            

          I haven't dived deeper into your patch, but these couple things caught my eye at first.

          Show
          Prashant Kommireddi added a comment - Thanks Johnny Zhang . I took a quick look and have a couple comments: You can check for UDFs in the import list in addition to builtins. private static String BUILTINUDF_PACKAGE = "org.apache.pig.builtin" ; You could rather use PigContext.getPackageImportList() for the comprehensive list. Use commons logging instead of printing stacktrace } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } I haven't dived deeper into your patch, but these couple things caught my eye at first.
          Hide
          Johnny Zhang added a comment -

          Prashant Kommireddi, thanks a lot for review my patch. I will post a polished soon.

          Show
          Johnny Zhang added a comment - Prashant Kommireddi , thanks a lot for review my patch. I will post a polished soon.
          Hide
          Johnny Zhang added a comment -

          Prashant Kommireddi, here is the polished patch address the comments. Please let me know if you want to set up review board for it. Thanks for your support!

          Show
          Johnny Zhang added a comment - Prashant Kommireddi , here is the polished patch address the comments. Please let me know if you want to set up review board for it. Thanks for your support!
          Hide
          Johnny Zhang added a comment -

          new patch fix the indention

          Show
          Johnny Zhang added a comment - new patch fix the indention
          Hide
          Daniel Dai added a comment -

          I tried a simple query with the patch:

          define ACOUNT(in_relation, min_gpa) returns c {
                                     b = filter $in_relation by gpa >= $min_gpa;
                                     $c = foreach b generate age, name;
                                 }
          
          a = load '1.txt' as (name, age, gpa);
          x = ACOUNT(a, 3);
          dump x;
          

          I did a dry-run and hit the exception:
          ERROR 2999: Unexpected internal error. Index: 0, Size: 0

          java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
          at java.util.ArrayList.RangeCheck(ArrayList.java:547)
          at java.util.ArrayList.get(ArrayList.java:322)
          at org.apache.pig.parser.PigMacro.checkMacroNameMasking(PigMacro.java:440)
          at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:509)
          at org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.java:290)
          at org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:279)
          at org.apache.pig.parser.DryRunGruntParser.processPig(DryRunGruntParser.java:292)
          at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:499)
          at org.apache.pig.parser.DryRunGruntParser.parseStopOnError(DryRunGruntParser.java:67)
          at org.apache.pig.Main.dryrun(Main.java:680)
          at org.apache.pig.Main.run(Main.java:575)
          at org.apache.pig.Main.main(Main.java:157)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

          Am I missing something here?

          Show
          Daniel Dai added a comment - I tried a simple query with the patch: define ACOUNT(in_relation, min_gpa) returns c { b = filter $in_relation by gpa >= $min_gpa; $c = foreach b generate age, name; } a = load '1.txt' as (name, age, gpa); x = ACOUNT(a, 3); dump x; I did a dry-run and hit the exception: ERROR 2999: Unexpected internal error. Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.parser.PigMacro.checkMacroNameMasking(PigMacro.java:440) at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:509) at org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.java:290) at org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:279) at org.apache.pig.parser.DryRunGruntParser.processPig(DryRunGruntParser.java:292) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:499) at org.apache.pig.parser.DryRunGruntParser.parseStopOnError(DryRunGruntParser.java:67) at org.apache.pig.Main.dryrun(Main.java:680) at org.apache.pig.Main.run(Main.java:575) at org.apache.pig.Main.main(Main.java:157) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Am I missing something here?
          Hide
          Johnny Zhang added a comment -

          Daniel, appreciate your time to try out the patch. My apologize miss this case. Not sure why test pass at the moment when I submit the patch.

          I just upload the revised patch and did test for following cases:
          1. in one pig script, one macro definition, which mask a builtin UDF COUNT
          2. in one pig script, two macro definition, which mask a builtin UDF COUNT and ABS
          3. in one pig script, one macro definition, which doesn't mask any builtin UDF
          4. in one pig script, two macro definition, both do't mask any builtin UDF
          5. in one pig script, two macro definition, one mask builtin UDF COUNT, another macro doesn't mask any UDF.

          They all pass and make sure only one warning message thrown once per one macro if condition applied.

          Show
          Johnny Zhang added a comment - Daniel, appreciate your time to try out the patch. My apologize miss this case. Not sure why test pass at the moment when I submit the patch. I just upload the revised patch and did test for following cases: 1. in one pig script, one macro definition, which mask a builtin UDF COUNT 2. in one pig script, two macro definition, which mask a builtin UDF COUNT and ABS 3. in one pig script, one macro definition, which doesn't mask any builtin UDF 4. in one pig script, two macro definition, both do't mask any builtin UDF 5. in one pig script, two macro definition, one mask builtin UDF COUNT, another macro doesn't mask any UDF. They all pass and make sure only one warning message thrown once per one macro if condition applied.
          Hide
          Daniel Dai added a comment -

          Actually, I feel the check is very involved in the patch. Now we check UDF (after going through all the classloader trick), how about default/define? How about keyword? We need to exhaustively check everything and make sure there is no conflict. On the other side, how about defining an order Pig resolve a symbol, and clearly documented.

          At this moment I would prefer documenting the symbol resolving order, unless a cleaner solution is proposed.

          Show
          Daniel Dai added a comment - Actually, I feel the check is very involved in the patch. Now we check UDF (after going through all the classloader trick), how about default/define? How about keyword? We need to exhaustively check everything and make sure there is no conflict. On the other side, how about defining an order Pig resolve a symbol, and clearly documented. At this moment I would prefer documenting the symbol resolving order, unless a cleaner solution is proposed.
          Hide
          Johnny Zhang added a comment -

          Daniel, can you explain the term "symbol resolving order". Does that mean we set different level of resolving order for different kind of symbol, and low level symbol respect the higher level symbol (respect means cannot mask it) ?

          Show
          Johnny Zhang added a comment - Daniel, can you explain the term "symbol resolving order". Does that mean we set different level of resolving order for different kind of symbol, and low level symbol respect the higher level symbol (respect means cannot mask it) ?
          Hide
          Daniel Dai added a comment -

          What I mean is something like (for illustration only, may not true)
          keywords > default > define > MACRO > UDF

          If we have a name conflicts, the higher side wins. eg. Macro "COUNT" will mask UDF "COUNT"

          Show
          Daniel Dai added a comment - What I mean is something like (for illustration only, may not true) keywords > default > define > MACRO > UDF If we have a name conflicts, the higher side wins. eg. Macro "COUNT" will mask UDF "COUNT"
          Hide
          Johnny Zhang added a comment -

          Daniel, this is a good idea, and in bigger picture. 3 steps I am think of:
          1. decide the symbol resolving order
          2. check the masking is working for designed order
          3. document the usage.
          what do you think? Maybe we should put logic of this patch into some common place, extend it for all symbol resolving check.

          Show
          Johnny Zhang added a comment - Daniel, this is a good idea, and in bigger picture. 3 steps I am think of: 1. decide the symbol resolving order 2. check the masking is working for designed order 3. document the usage. what do you think? Maybe we should put logic of this patch into some common place, extend it for all symbol resolving check.
          Hide
          Daniel Dai added a comment -

          That will be great! Thanks

          Show
          Daniel Dai added a comment - That will be great! Thanks
          Hide
          Alan Gates added a comment -

          Canceling patch as discussion is still on-going as to best approach

          Show
          Alan Gates added a comment - Canceling patch as discussion is still on-going as to best approach

            People

            • Assignee:
              Johnny Zhang
              Reporter:
              Alan Gates
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development