Pig
  1. Pig
  2. PIG-1097

Pig do not support group by boolean type

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: impl
    • Labels:
      None

      Description

      My Script is as following, the TestUDF return boolean type.


      DEFINE testUDF org.apache.pig.piggybank.util.TestUDF();

      raw = LOAD 'data/input';
      raw = FOREACH raw GENERATE testUDF();
      raw = GROUP raw BY $0;
      DUMP raw;

      The above script will throw exception:

      Exception in thread "main" org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias raw
      at org.apache.pig.PigServer.openIterator(PigServer.java:481)
      at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
      at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
      at org.apache.pig.PigServer.registerScript(PigServer.java:409)
      at PigExample.main(PigExample.java:13)
      Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias raw
      at org.apache.pig.PigServer.store(PigServer.java:536)
      at org.apache.pig.PigServer.openIterator(PigServer.java:464)
      ... 5 more
      Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution.
      at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:269)
      at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780)
      at org.apache.pig.PigServer.store(PigServer.java:528)
      ... 6 more
      Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2036: Unhandled key type boolean
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.selectComparator(JobControlCompiler.java:856)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:561)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:251)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:128)
      at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
      ... 8 more

        Issue Links

          Activity

          Jeff Zhang created issue -
          Hide
          Pradeep Kamath added a comment -

          There is a bigger issue here - pig does not support boolean type anywhere except for conditions like in a filter or split. For example, boolean is not allowed in a foreach projection either. We would need to decide if we want to support a boolean type and the implication are more widespread aside from group by.

          Show
          Pradeep Kamath added a comment - There is a bigger issue here - pig does not support boolean type anywhere except for conditions like in a filter or split. For example, boolean is not allowed in a foreach projection either. We would need to decide if we want to support a boolean type and the implication are more widespread aside from group by.
          Hide
          David Ciemiewicz added a comment -

          I think that one could argue that Filter functions are REALLY just Eval<Boolean> functions in disguise.

          That Filter functions were a way of adding return type to Pig for Boolean cases when Pig had no types.

          Further, I'd argue, that now that Pig does have data types, that Filter should be deprecated and all Filter functions should now become Eval<Boolean>.

          In otherwords, I believe it was an oversight in the types migration to not migrate Filter to Eval<Boolean>

          Show
          David Ciemiewicz added a comment - I think that one could argue that Filter functions are REALLY just Eval<Boolean> functions in disguise. That Filter functions were a way of adding return type to Pig for Boolean cases when Pig had no types. Further, I'd argue, that now that Pig does have data types, that Filter should be deprecated and all Filter functions should now become Eval<Boolean>. In otherwords, I believe it was an oversight in the types migration to not migrate Filter to Eval<Boolean>
          Hide
          Jeff Zhang added a comment -

          agree, FilterFunc is equivalent to EvalFunc<Boolean> in my opinion. I do not know about the history of FilterFunc, does it come before pig support types? But now I think it should be deprecated.

          And why pig do not support boolean type in foreach projection and group by ? any performance consideration ?

          Show
          Jeff Zhang added a comment - agree, FilterFunc is equivalent to EvalFunc<Boolean> in my opinion. I do not know about the history of FilterFunc, does it come before pig support types? But now I think it should be deprecated. And why pig do not support boolean type in foreach projection and group by ? any performance consideration ?
          Hide
          Alan Gates added a comment -

          FilterFunc did come before types. Before we could deprecate it we need to make Boolean a full fledged type.

          I think making Boolean a full type is fine, we just didn't do it when we added types. There's a fair amount of work to do to make it happen. The parser needs to change to support boolean (or bool, whichever we use) and true and false as a keywords. It also needs to change to allow expressions in foreach to be of type boolean. LoadFunc interface needs to change to have a byteToBoolean method. DataReaderWriter can already handle booleans, so no changes there. Physical operators such as =, !=, and is null need to change to support it.

          Show
          Alan Gates added a comment - FilterFunc did come before types. Before we could deprecate it we need to make Boolean a full fledged type. I think making Boolean a full type is fine, we just didn't do it when we added types. There's a fair amount of work to do to make it happen. The parser needs to change to support boolean (or bool, whichever we use) and true and false as a keywords. It also needs to change to allow expressions in foreach to be of type boolean. LoadFunc interface needs to change to have a byteToBoolean method. DataReaderWriter can already handle booleans, so no changes there. Physical operators such as =, !=, and is null need to change to support it.
          Olga Natkovich made changes -
          Field Original Value New Value
          Fix Version/s 0.6.0 [ 12314214 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          Jeff, I see this ticket is assigned to you – any progress so far? Need a hand?

          Show
          Dmitriy V. Ryaboy added a comment - Jeff, I see this ticket is assigned to you – any progress so far? Need a hand?
          Russell Jurney made changes -
          Link This issue is blocked by PIG-1429 [ PIG-1429 ]
          Hide
          Thejas M Nair added a comment -

          Fixed in PIG-1429, as part of changes to introduce boolean types.

          Show
          Thejas M Nair added a comment - Fixed in PIG-1429 , as part of changes to introduce boolean types.
          Thejas M Nair made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.9.1 [ 12317343 ]
          Resolution Duplicate [ 3 ]
          Thejas M Nair made changes -
          Fix Version/s 0.10 [ 12316246 ]
          Fix Version/s 0.9.1 [ 12317343 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          642d 20h 49m 1 Thejas M Nair 23/Aug/11 00:51
          Resolved Resolved Closed Closed
          247d 20h 41m 1 Daniel Dai 26/Apr/12 21:32

            People

            • Assignee:
              Jeff Zhang
              Reporter:
              Jeff Zhang
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development