Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-665

Map key type not correctly set (for use when key is null) when map plan does not have localrearrange

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      KeyTypeDiscoveryVisitor visits the map plan to figure out the datatype of the map key. This is required so that when the map key is null, we can still construct a valid NullableXXXWritable object to pass on to hadoop in the collect() call (hadoop needs a valid object even for null objects). Currently the KeyTypeDiscoveryVisitor only looks at POPackage and POLocalRearrange to figure out the key type. In a pig script which results in multiple Map reduce jobs, one of the jobs could have a map plan with only POLoads in it. In such a case, the map key type is not discovered and this results in a null being returned from HDataType.getWritableComparableTypes() method. This in turn will result in a NullPointerException in the collect().

      Here is a script which can prompt this behavior:

      a = load 'a.txt' as (x:int, y:int, z:int);
      b = load 'b.txt' as (x:int, y:int);
      b_group = group b by x;
      b_sum = foreach b_group generate flatten(group) as x, SUM(b.y) as clicks;
      a_group = group a by (x, y);
      a_aggs = foreach a_group {
                  generate 
                      flatten(group) as (x, y),
                      SUM(a.z) as zs;
                      };
      join_a_b = join b_sum by x, a_aggs by x; --> the map plan for this join will only have two POLoads which will result in the NullPointerException at runtime in collect()
      dump join_a_b;
      
      

      Contents of a.txt (columns are tab separated):
      The first column of the first two rows is null (represented by an empty column)

              7       8
              8       9
      1       20      30
      1       20      40
      

      Contents of b.txt (columns are tab separated):

      7       2
      1       5
      1       10
      

        Attachments

        1. PIG-665.patch
          11 kB
          Pradeep Kamath

          Activity

            People

            • Assignee:
              pkamath Pradeep Kamath
              Reporter:
              pkamath Pradeep Kamath
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: