Pig
  1. Pig
  2. PIG-154

Move DEFINE and STORE parsing into QueryParser from GruntParser

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: grunt
    • Labels:
      None

      Description

      Currently GruntParser handles the parsing of DEFINE and STORE commands, it would be better to parse these in the main QueryParser since it has better infrastructure for expressions etc. which are really crucial for parsing UDFs etc.

      1. PIG-154_1_20080319.patch
        12 kB
        Arun C Murthy
      2. PIG-154_0_20080319.patch
        12 kB
        Arun C Murthy

        Activity

        Hide
        Arun C Murthy added a comment -

        Sorry I don't quite understand what you tried to explain. Is having LODefine in the logical plan due to the fact that we shortcut our processing step?

        Pi, Alan is referring to the fact that the newly introduced LODefine is essentially a dummy LogicalOperator which is never used at all... it was necessary since QueryParser.parse() currently returns a LogicalPlan (which contains a LogicalOperator). Alan is pointing out that future work on the Pig parsers should incorporate DEFINE like operators. Hope that helps ...

        Show
        Arun C Murthy added a comment - Sorry I don't quite understand what you tried to explain. Is having LODefine in the logical plan due to the fact that we shortcut our processing step? Pi, Alan is referring to the fact that the newly introduced LODefine is essentially a dummy LogicalOperator which is never used at all... it was necessary since QueryParser.parse() currently returns a LogicalPlan (which contains a LogicalOperator). Alan is pointing out that future work on the Pig parsers should incorporate DEFINE like operators. Hope that helps ...
        Hide
        Pi Song added a comment -

        Alan,

        Sorry I don't quite understand what you tried to explain. Is having LODefine in the logical plan due to the fact that we shortcut our processing step?

        Normal implementation:-

        [Code] ==parser==> [AST] ==code_generator==> [Intermediate Instructions] ==native_code_generator==> [native_executable]
        

        Pig implementation:-

        Compile Time
        [Code] ==parser==>  [Intermediate Instructions(Logical Plan)] ==native_code_generator(Physical Plan Compiler)==> [native_executable]
        
        Show
        Pi Song added a comment - Alan, Sorry I don't quite understand what you tried to explain. Is having LODefine in the logical plan due to the fact that we shortcut our processing step? Normal implementation:- [Code] ==parser==> [AST] ==code_generator==> [Intermediate Instructions] ==native_code_generator==> [native_executable] Pig implementation:- Compile Time [Code] ==parser==> [Intermediate Instructions(Logical Plan)] ==native_code_generator(Physical Plan Compiler)==> [native_executable]
        Hide
        Olga Natkovich added a comment -

        patch committed

        Show
        Olga Natkovich added a comment - patch committed
        Hide
        Alan Gates added a comment -

        I'm unsure about this thing of adding LODefine. I know that the way things are set up now, it's hard to get around. In general, in the future the pig parser will need to be able to handle lines that don't generate a real logical plan (define, register, set, etc.). For now this is ok, but we need to remember that this isn't the direction we want to go.

        Everything else looks good.

        Show
        Alan Gates added a comment - I'm unsure about this thing of adding LODefine. I know that the way things are set up now, it's hard to get around. In general, in the future the pig parser will need to be able to handle lines that don't generate a real logical plan (define, register, set, etc.). For now this is ok, but we need to remember that this isn't the direction we want to go. Everything else looks good.
        Hide
        Arun C Murthy added a comment -

        Updated patch which fixed a minor regression in STORE parsing...

        STORE accepts 'PigStorage' and 'PigStorage()' for the UDF while LOAD accepts only 'PigStorage()' and hence the regression. Fixed since.

        Show
        Arun C Murthy added a comment - Updated patch which fixed a minor regression in STORE parsing... STORE accepts 'PigStorage' and 'PigStorage()' for the UDF while LOAD accepts only 'PigStorage()' and hence the regression. Fixed since.
        Hide
        Arun C Murthy added a comment -

        Minor change to earlier patch ...

        Forgot to add this note previously: this patch uses current infrastructure for parsing UDFs (used in LOAD/GROUP) in QueryParser. This regresses on some of the additional parsing done in PigScriptParser... I've opened PIG-163 to fix this once and for all.

        Show
        Arun C Murthy added a comment - Minor change to earlier patch ... Forgot to add this note previously: this patch uses current infrastructure for parsing UDFs (used in LOAD/GROUP) in QueryParser. This regresses on some of the additional parsing done in PigScriptParser... I've opened PIG-163 to fix this once and for all.
        Hide
        Arun C Murthy added a comment -

        Here is a patch for review while I continue doing some large-scale tests... I've moved parsing of DEFINE and STORE to QueryParser from GruntParser and ensured it passes all unit tests.

        Show
        Arun C Murthy added a comment - Here is a patch for review while I continue doing some large-scale tests... I've moved parsing of DEFINE and STORE to QueryParser from GruntParser and ensured it passes all unit tests.

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development