Pig
  1. Pig
  2. PIG-2315

Make as clause work in generate

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, the following syntax is supported and ignored causing confusing with users:

      A1 = foreach A1 generate a as a:chararray ;

      After this statement a just retains its previous type

      1. PIG-2315-1.patch
        23 kB
        Daniel Dai
      2. PIG-2315-1.patch
        23 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Gianmarco De Francisci Morales added a comment -

          What is the desired solution?
          To make the aforementioned syntax work as a cast or to allow only explicit casting?
          E.g.

          A1 = foreach A1 generate (chararray) a;
          
          Show
          Gianmarco De Francisci Morales added a comment - What is the desired solution? To make the aforementioned syntax work as a cast or to allow only explicit casting? E.g. A1 = foreach A1 generate (chararray) a;
          Hide
          Daniel Dai added a comment -

          I had a discussion with Thejas before, we want to perform the "as" cast after we evaluate the generated item:

          A1 = foreach A1 generate (int)a as a:chararray;
          => A1 = foreach A1 generate (chararray)((int)a) as a;

          Show
          Daniel Dai added a comment - I had a discussion with Thejas before, we want to perform the "as" cast after we evaluate the generated item: A1 = foreach A1 generate (int)a as a:chararray; => A1 = foreach A1 generate (chararray)((int)a) as a;
          Hide
          Prashant Kommireddi added a comment -

          How would this work when a Tuple is being mapped using "AS"

          vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE(*), ('timestamp', 'runTime', 'cpuTime'))) as 
                      (ts, runTime, cpuTime);
          

          Here I call a function LFV which returns a Tuple which is being mapped to

          {ts, runTime, cpuTime}

          .

          Show
          Prashant Kommireddi added a comment - How would this work when a Tuple is being mapped using "AS" vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE(*), ('timestamp', 'runTime', 'cpuTime'))) as (ts, runTime, cpuTime); Here I call a function LFV which returns a Tuple which is being mapped to {ts, runTime, cpuTime} .
          Hide
          Gianmarco De Francisci Morales added a comment -

          I assume that as long as you don't cast it to some types this should have no effect on your code.
          In your example you are just renaming fields, not casting them.

          Show
          Gianmarco De Francisci Morales added a comment - I assume that as long as you don't cast it to some types this should have no effect on your code. In your example you are just renaming fields, not casting them.
          Hide
          Prashant Kommireddi added a comment -

          Apologies, it should have read

          vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE(*), ('timestamp', 'runTime', 'cpuTime'))) as 
                      (ts:chararray, runTime:double, cpuTime:double);
          
          Show
          Prashant Kommireddi added a comment - Apologies, it should have read vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE(*), ('timestamp', 'runTime', 'cpuTime'))) as (ts:chararray, runTime: double , cpuTime: double );
          Hide
          Prashant Kommireddi added a comment -

          Return type of UDF "LFV" is bytearray. The issue here is that when I pass the field "ts" to a MIN function, or "runTime" to a SUM (after a group by) the scripts errors out with a ClassCastException (Cannot convert bytearray to String/Double)

          Show
          Prashant Kommireddi added a comment - Return type of UDF "LFV" is bytearray. The issue here is that when I pass the field "ts" to a MIN function, or "runTime" to a SUM (after a group by) the scripts errors out with a ClassCastException (Cannot convert bytearray to String/Double)
          Hide
          Ruslan Al-Fakikh added a comment -

          As a user I would desire the way as it was suggested here:
          https://issues.apache.org/jira/browse/PIG-2216
          to allow just one syntax for casting (and forbid/deprecate the other).

          Daniel, don't you think that:

          A1 = foreach A1 generate (int)a as a:chararray;
          => A1 = foreach A1 generate (chararray)((int)a) as a;

          will make things more complicated?

          Show
          Ruslan Al-Fakikh added a comment - As a user I would desire the way as it was suggested here: https://issues.apache.org/jira/browse/PIG-2216 to allow just one syntax for casting (and forbid/deprecate the other). Daniel, don't you think that: A1 = foreach A1 generate (int)a as a:chararray; => A1 = foreach A1 generate (chararray)((int)a) as a; will make things more complicated?
          Hide
          Daniel Dai added a comment -

          Agree. However, we cannot break backward incompatibility by disallowing "()" style cast. We definitely don't encourage using different style of cast. We can mark "()" deprecate though.

          Show
          Daniel Dai added a comment - Agree. However, we cannot break backward incompatibility by disallowing "()" style cast. We definitely don't encourage using different style of cast. We can mark "()" deprecate though.
          Hide
          Ruslan Al-Fakikh added a comment -

          Daniel, I think that deprecating/removing the "cast in the as clause" is easier, because it is not working anyway. I guess the "()" should stay.

          I also have a suggestion to make this issue a duplicate of PIG-2216 instead of PIG-2216 being a duplicate of this issue. It seems that the description of PIG-2216 explains just everything and does not cause confusion.

          Show
          Ruslan Al-Fakikh added a comment - Daniel, I think that deprecating/removing the "cast in the as clause" is easier, because it is not working anyway. I guess the "()" should stay. I also have a suggestion to make this issue a duplicate of PIG-2216 instead of PIG-2216 being a duplicate of this issue. It seems that the description of PIG-2216 explains just everything and does not cause confusion.
          Hide
          Koji Noguchi added a comment -

          > because it is not working anyway.
          >
          There's at least one case it's working for our users.

          a = load 'input.txt' as (nb:bag{});
          b = foreach a generate flatten(nb) as (year, name:bytearray);
          c = filter b by name == 'user1';
          dump c;
          

          Above case works. But without the ':bytearray' in relation b, it fails.

          a = load 'input.txt' as (nb:bag{});
          b = foreach a generate flatten(nb) as (year, name);
          c = filter b by name == 'user1';
          dump c;
          

          "Front End: ERROR 1052: Cannot cast bytearray to chararray"

          Please keep the first case valid. (Thanks Fu Ding for this example.)
          Error message in the second case is misleading that it's actually trying to typecast NULL to chararray.

          Show
          Koji Noguchi added a comment - > because it is not working anyway. > There's at least one case it's working for our users. a = load 'input.txt' as (nb:bag{}); b = foreach a generate flatten(nb) as (year, name:bytearray); c = filter b by name == 'user1'; dump c; Above case works. But without the ':bytearray' in relation b, it fails. a = load 'input.txt' as (nb:bag{}); b = foreach a generate flatten(nb) as (year, name); c = filter b by name == 'user1'; dump c; "Front End: ERROR 1052: Cannot cast bytearray to chararray" Please keep the first case valid. (Thanks Fu Ding for this example.) Error message in the second case is misleading that it's actually trying to typecast NULL to chararray.
          Hide
          Daniel Dai added a comment -

          Attach a patch to fix the issue by adding a cast only foreach below:

          A1 = foreach A1 generate a as a:chararray;
          =>
          A1 = foreach A1 generate a;
          A1 = foreach A1 generate (chararray)a;

          b = foreach a generate flatten(nb) as (year, name:chararray);
          =>
          b = foreach a generate flatten(nb) as (year, name);
          b = foreach b generate year, (chararray)name;

          vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE, ('timestamp', 'runTime', 'cpuTime'))) as (ts:chararray, runTime:double, cpuTime:double);
          => vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE, ('timestamp', 'runTime', 'cpuTime'))) as (ts, runTime, cpuTime);
          vLogFields = FOREACH vLogFields GENERATE (chararray)ts, (double)runTime, (double)cpuTime;

          Show
          Daniel Dai added a comment - Attach a patch to fix the issue by adding a cast only foreach below: A1 = foreach A1 generate a as a:chararray; => A1 = foreach A1 generate a; A1 = foreach A1 generate (chararray)a; b = foreach a generate flatten(nb) as (year, name:chararray); => b = foreach a generate flatten(nb) as (year, name); b = foreach b generate year, (chararray)name; vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE , ('timestamp', 'runTime', 'cpuTime'))) as (ts:chararray, runTime:double, cpuTime:double); => vLogFields = FOREACH vLogs GENERATE FLATTEN(LFV(TOTUPLE , ('timestamp', 'runTime', 'cpuTime'))) as (ts, runTime, cpuTime); vLogFields = FOREACH vLogFields GENERATE (chararray)ts, (double)runTime, (double)cpuTime;
          Hide
          Daniel Dai added a comment -

          Fix unit test failures.

          Show
          Daniel Dai added a comment - Fix unit test failures.

            People

            • Assignee:
              Gianmarco De Francisci Morales
              Reporter:
              Olga Natkovich
            • Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development