Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-665

Map key type not correctly set (for use when key is null) when map plan does not have localrearrange

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2.0
    • 0.2.0
    • None
    • None
    • Reviewed

    Description

      KeyTypeDiscoveryVisitor visits the map plan to figure out the datatype of the map key. This is required so that when the map key is null, we can still construct a valid NullableXXXWritable object to pass on to hadoop in the collect() call (hadoop needs a valid object even for null objects). Currently the KeyTypeDiscoveryVisitor only looks at POPackage and POLocalRearrange to figure out the key type. In a pig script which results in multiple Map reduce jobs, one of the jobs could have a map plan with only POLoads in it. In such a case, the map key type is not discovered and this results in a null being returned from HDataType.getWritableComparableTypes() method. This in turn will result in a NullPointerException in the collect().

      Here is a script which can prompt this behavior:

      a = load 'a.txt' as (x:int, y:int, z:int);
      b = load 'b.txt' as (x:int, y:int);
      b_group = group b by x;
      b_sum = foreach b_group generate flatten(group) as x, SUM(b.y) as clicks;
      a_group = group a by (x, y);
      a_aggs = foreach a_group {
                  generate 
                      flatten(group) as (x, y),
                      SUM(a.z) as zs;
                      };
      join_a_b = join b_sum by x, a_aggs by x; --> the map plan for this join will only have two POLoads which will result in the NullPointerException at runtime in collect()
      dump join_a_b;
      
      

      Contents of a.txt (columns are tab separated):
      The first column of the first two rows is null (represented by an empty column)

              7       8
              8       9
      1       20      30
      1       20      40
      

      Contents of b.txt (columns are tab separated):

      7       2
      1       5
      1       10
      

      Attachments

        1. PIG-665.patch
          11 kB
          Pradeep Kamath

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pkamath Pradeep Kamath
            pkamath Pradeep Kamath
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment