Pig
  1. Pig
  2. PIG-3379

Alias reuse in nested foreach causes PIG script to fail

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.1
    • Fix Version/s: 0.12.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The following script fails:

      temp.pig
      Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
      Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
      EventsPerMinute = GROUP Events BY (eventTime / 60000);
      EventsPerMinute = FOREACH EventsPerMinute {
        DistinctDevices = DISTINCT Events.deviceId;
        nbDevices = SIZE(DistinctDevices);
      
        DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
        nbDevicesWatching = SIZE(DistinctDevices);
      
        GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
      }
      EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 100000;
      A = FOREACH EventsPerMinute GENERATE timeStamp;
      describe A;
      

      With the error:

      2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: 
      <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray.
      

      Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem.

      1. PIG-3379-draft.patch
        1 kB
        Daniel Dai
      2. PIG-3379.patch
        13 kB
        Xuefu Zhang

        Activity

        Xuefu Zhang created issue -
        Xuefu Zhang made changes -
        Field Original Value New Value
        Description The following script fails:

        Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
        Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
        EventsPerMinute = GROUP Events BY (eventTime / 60000);
        EventsPerMinute = FOREACH EventsPerMinute {
          DistinctDevices = DISTINCT Events.deviceId;
          nbDevices = SIZE(DistinctDevices);

          DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
          nbDevicesWatching = SIZE(DistinctDevices);

          GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
        }
        EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 100000;
        A = FOREACH EventsPerMinute GENERATE timeStamp;
        describe A;

        With the error:

        2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
        <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray.

        Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem.
        The following script fails:
        {{
        Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
        Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
        EventsPerMinute = GROUP Events BY (eventTime / 60000);
        EventsPerMinute = FOREACH EventsPerMinute {
          DistinctDevices = DISTINCT Events.deviceId;
          nbDevices = SIZE(DistinctDevices);

          DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
          nbDevicesWatching = SIZE(DistinctDevices);

          GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
        }
        EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 100000;
        A = FOREACH EventsPerMinute GENERATE timeStamp;
        describe A;
        }}
        With the error:

        2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
        <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray.

        Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem.
        Xuefu Zhang made changes -
        Description The following script fails:
        {{
        Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
        Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
        EventsPerMinute = GROUP Events BY (eventTime / 60000);
        EventsPerMinute = FOREACH EventsPerMinute {
          DistinctDevices = DISTINCT Events.deviceId;
          nbDevices = SIZE(DistinctDevices);

          DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
          nbDevicesWatching = SIZE(DistinctDevices);

          GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
        }
        EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 100000;
        A = FOREACH EventsPerMinute GENERATE timeStamp;
        describe A;
        }}
        With the error:

        2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
        <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray.

        Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem.
        The following script fails:
        {code:title=temp.pig}
        Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
        Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
        EventsPerMinute = GROUP Events BY (eventTime / 60000);
        EventsPerMinute = FOREACH EventsPerMinute {
          DistinctDevices = DISTINCT Events.deviceId;
          nbDevices = SIZE(DistinctDevices);

          DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
          nbDevicesWatching = SIZE(DistinctDevices);

          GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
        }
        EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 100000;
        A = FOREACH EventsPerMinute GENERATE timeStamp;
        describe A;
        {code}
        With the error:

        2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
        <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray.

        Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem.
        Xuefu Zhang made changes -
        Description The following script fails:
        {code:title=temp.pig}
        Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
        Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
        EventsPerMinute = GROUP Events BY (eventTime / 60000);
        EventsPerMinute = FOREACH EventsPerMinute {
          DistinctDevices = DISTINCT Events.deviceId;
          nbDevices = SIZE(DistinctDevices);

          DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
          nbDevicesWatching = SIZE(DistinctDevices);

          GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
        }
        EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 100000;
        A = FOREACH EventsPerMinute GENERATE timeStamp;
        describe A;
        {code}
        With the error:

        2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
        <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray.

        Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem.
        The following script fails:
        {code:title=temp.pig}
        Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
        Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
        EventsPerMinute = GROUP Events BY (eventTime / 60000);
        EventsPerMinute = FOREACH EventsPerMinute {
          DistinctDevices = DISTINCT Events.deviceId;
          nbDevices = SIZE(DistinctDevices);

          DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
          nbDevicesWatching = SIZE(DistinctDevices);

          GENERATE $0*60000 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching;
        }
        EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 100000;
        A = FOREACH EventsPerMinute GENERATE timeStamp;
        describe A;
        {code}
        With the error:
        {code}
        2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
        <file /home/xzhang/Documents/temp.pig, line 14, column 37> Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray.
        {code}
        Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem.
        Xuefu Zhang made changes -
        Attachment PIG-3379.patch [ 12596204 ]
        Xuefu Zhang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Daniel Dai made changes -
        Attachment PIG-3379-draft.patch [ 12596916 ]
        Daniel Dai made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.12 [ 12323380 ]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Xuefu Zhang
            Reporter:
            Xuefu Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development