Pig
  1. Pig
  2. PIG-1097

Pig do not support group by boolean type

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: impl
    • Labels:
      None

      Description

      My Script is as following, the TestUDF return boolean type.


      DEFINE testUDF org.apache.pig.piggybank.util.TestUDF();

      raw = LOAD 'data/input';
      raw = FOREACH raw GENERATE testUDF();
      raw = GROUP raw BY $0;
      DUMP raw;

      The above script will throw exception:

      Exception in thread "main" org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias raw
      at org.apache.pig.PigServer.openIterator(PigServer.java:481)
      at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
      at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
      at org.apache.pig.PigServer.registerScript(PigServer.java:409)
      at PigExample.main(PigExample.java:13)
      Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias raw
      at org.apache.pig.PigServer.store(PigServer.java:536)
      at org.apache.pig.PigServer.openIterator(PigServer.java:464)
      ... 5 more
      Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution.
      at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:269)
      at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780)
      at org.apache.pig.PigServer.store(PigServer.java:528)
      ... 6 more
      Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2036: Unhandled key type boolean
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.selectComparator(JobControlCompiler.java:856)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:561)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:251)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:128)
      at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
      ... 8 more

        Issue Links

          Activity

          Hide
          Thejas M Nair added a comment -

          Fixed in PIG-1429, as part of changes to introduce boolean types.

          Show
          Thejas M Nair added a comment - Fixed in PIG-1429 , as part of changes to introduce boolean types.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Jeff, I see this ticket is assigned to you – any progress so far? Need a hand?

          Show
          Dmitriy V. Ryaboy added a comment - Jeff, I see this ticket is assigned to you – any progress so far? Need a hand?
          Hide
          Alan Gates added a comment -

          FilterFunc did come before types. Before we could deprecate it we need to make Boolean a full fledged type.

          I think making Boolean a full type is fine, we just didn't do it when we added types. There's a fair amount of work to do to make it happen. The parser needs to change to support boolean (or bool, whichever we use) and true and false as a keywords. It also needs to change to allow expressions in foreach to be of type boolean. LoadFunc interface needs to change to have a byteToBoolean method. DataReaderWriter can already handle booleans, so no changes there. Physical operators such as =, !=, and is null need to change to support it.

          Show
          Alan Gates added a comment - FilterFunc did come before types. Before we could deprecate it we need to make Boolean a full fledged type. I think making Boolean a full type is fine, we just didn't do it when we added types. There's a fair amount of work to do to make it happen. The parser needs to change to support boolean (or bool, whichever we use) and true and false as a keywords. It also needs to change to allow expressions in foreach to be of type boolean. LoadFunc interface needs to change to have a byteToBoolean method. DataReaderWriter can already handle booleans, so no changes there. Physical operators such as =, !=, and is null need to change to support it.
          Hide
          Jeff Zhang added a comment -

          agree, FilterFunc is equivalent to EvalFunc<Boolean> in my opinion. I do not know about the history of FilterFunc, does it come before pig support types? But now I think it should be deprecated.

          And why pig do not support boolean type in foreach projection and group by ? any performance consideration ?

          Show
          Jeff Zhang added a comment - agree, FilterFunc is equivalent to EvalFunc<Boolean> in my opinion. I do not know about the history of FilterFunc, does it come before pig support types? But now I think it should be deprecated. And why pig do not support boolean type in foreach projection and group by ? any performance consideration ?
          Hide
          David Ciemiewicz added a comment -

          I think that one could argue that Filter functions are REALLY just Eval<Boolean> functions in disguise.

          That Filter functions were a way of adding return type to Pig for Boolean cases when Pig had no types.

          Further, I'd argue, that now that Pig does have data types, that Filter should be deprecated and all Filter functions should now become Eval<Boolean>.

          In otherwords, I believe it was an oversight in the types migration to not migrate Filter to Eval<Boolean>

          Show
          David Ciemiewicz added a comment - I think that one could argue that Filter functions are REALLY just Eval<Boolean> functions in disguise. That Filter functions were a way of adding return type to Pig for Boolean cases when Pig had no types. Further, I'd argue, that now that Pig does have data types, that Filter should be deprecated and all Filter functions should now become Eval<Boolean>. In otherwords, I believe it was an oversight in the types migration to not migrate Filter to Eval<Boolean>
          Hide
          Pradeep Kamath added a comment -

          There is a bigger issue here - pig does not support boolean type anywhere except for conditions like in a filter or split. For example, boolean is not allowed in a foreach projection either. We would need to decide if we want to support a boolean type and the implication are more widespread aside from group by.

          Show
          Pradeep Kamath added a comment - There is a bigger issue here - pig does not support boolean type anywhere except for conditions like in a filter or split. For example, boolean is not allowed in a foreach projection either. We would need to decide if we want to support a boolean type and the implication are more widespread aside from group by.

            People

            • Assignee:
              Jeff Zhang
              Reporter:
              Jeff Zhang
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development