Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      User can specify the a typed map in place of untyped map using the syntax:
      map[type]

      Untyped map still works as before.
      Show
      User can specify the a typed map in place of untyped map using the syntax: map[type] Untyped map still works as before.

      Description

      Currently Pig map type is untyped, which means map value is always of bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a shuffle key, which somewhat relieve the problem. However, typed map is still beneficial in that:

      1. User can make semantic use of the map value type. Currently, user need to explicitly cast map value, which is ugly
      2. Though PIG-1277 allow unknown type be a shuffle key, the performance suffers. We don't have a raw comparator for the unknown type, instead, we need to instantiate the value object and invoke its comparator

      Here is proposed syntax for typed map:
      map[type]

      Typed map can be used in place of untyped map could occur. For example:
      a = load '1.txt' as(map[int]);
      b = foreach a generate (map[(i:int)])a0; - - Map value is tuple
      b = stream a through `cat` as (m:map[

      {(i:int,j:chararray)}

      ]); - - Map value is bag

      MapLookup a typed map will result datatype of map value.
      a = load '1.txt' as(map[int]);
      b = foreach a generate $0#'key';

      Schema for b:
      b:

      {int}

      The behavior of untyped map will remain the same.

      1. PIG-1876_3.patch
        40 kB
        Richard Ding
      2. PIG-1876-1.patch
        39 kB
        Daniel Dai
      3. PIG-1876-2.patch
        39 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Olga Natkovich added a comment -

          design looks good

          Show
          Olga Natkovich added a comment - design looks good
          Hide
          Alan Gates added a comment -

          I assume at the end when the schema for b is listed as

          {chararray}

          it really should be

          {int}

          , correct?

          Syntax and semantics look good.

          Are there any error conditions we need to think about? The only one I could come up with was cases where the values in the map aren't of the indicated type, but I assume we'll handle this just as if the top level type wasn't what was declared.

          This will drive changes in the LoadCaster interface. Those should be specified here as well. Do we have any plans to minimize backward compatibility issues for users on that?

          Show
          Alan Gates added a comment - I assume at the end when the schema for b is listed as {chararray} it really should be {int} , correct? Syntax and semantics look good. Are there any error conditions we need to think about? The only one I could come up with was cases where the values in the map aren't of the indicated type, but I assume we'll handle this just as if the top level type wasn't what was declared. This will drive changes in the LoadCaster interface. Those should be specified here as well. Do we have any plans to minimize backward compatibility issues for users on that?
          Hide
          Daniel Dai added a comment -
          {chararray}

          is a typo, should be

          {int}

          . I changed it.

          For LoadCaster, we need to add:
          public Map<String, Object> bytesToMap(byte[] b, ResourceFieldSchema fieldSchema) throws IOException;

          And deprecate the bytesToMap which does not take field schema.

          Show
          Daniel Dai added a comment - {chararray} is a typo, should be {int} . I changed it. For LoadCaster, we need to add: public Map<String, Object> bytesToMap(byte[] b, ResourceFieldSchema fieldSchema) throws IOException; And deprecate the bytesToMap which does not take field schema.
          Hide
          Ashutosh Chauhan added a comment -

          Is there any restriction on types of keys and values? Like keys restricted to primitive types or all types are allowed for both keys and values?

          Show
          Ashutosh Chauhan added a comment - Is there any restriction on types of keys and values? Like keys restricted to primitive types or all types are allowed for both keys and values?
          Hide
          Daniel Dai added a comment -

          Only for values. Keys are string.

          Show
          Daniel Dai added a comment - Only for values. Keys are string.
          Hide
          Ashutosh Chauhan added a comment -

          So, allowed maps are Map<String,Primitives> ? Or, values can be either primitive or bytearrays? If user didn't specify any type for value, then bytearray is assumed?

          Show
          Ashutosh Chauhan added a comment - So, allowed maps are Map<String,Primitives> ? Or, values can be either primitive or bytearrays? If user didn't specify any type for value, then bytearray is assumed?
          Hide
          Daniel Dai added a comment -

          The value can be anything, including bag/tuple/map. If user don't specify any type, then we assume it is bytearray, which is backward compatible with previous definition.

          Show
          Daniel Dai added a comment - The value can be anything, including bag/tuple/map. If user don't specify any type, then we assume it is bytearray, which is backward compatible with previous definition.
          Hide
          Daniel Dai added a comment -

          PIG-1876-2.patch resync with trunk.

          Show
          Daniel Dai added a comment - PIG-1876 -2.patch resync with trunk.
          Hide
          Richard Ding added a comment -

          Added a few unit tests for macro.

          Show
          Richard Ding added a comment - Added a few unit tests for macro.
          Hide
          Richard Ding added a comment -

          +1

          Show
          Richard Ding added a comment - +1
          Hide
          Daniel Dai added a comment -

          Patch committed to trunk.

          Show
          Daniel Dai added a comment - Patch committed to trunk.
          Hide
          Daniel Dai added a comment -
          Show
          Daniel Dai added a comment - Review notes: https://reviews.apache.org/r/472/

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development