Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      User can specify the a typed map in place of untyped map using the syntax:
      map[type]

      Untyped map still works as before.
      Show
      User can specify the a typed map in place of untyped map using the syntax: map[type] Untyped map still works as before.

      Description

      Currently Pig map type is untyped, which means map value is always of bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a shuffle key, which somewhat relieve the problem. However, typed map is still beneficial in that:

      1. User can make semantic use of the map value type. Currently, user need to explicitly cast map value, which is ugly
      2. Though PIG-1277 allow unknown type be a shuffle key, the performance suffers. We don't have a raw comparator for the unknown type, instead, we need to instantiate the value object and invoke its comparator

      Here is proposed syntax for typed map:
      map[type]

      Typed map can be used in place of untyped map could occur. For example:
      a = load '1.txt' as(map[int]);
      b = foreach a generate (map[(i:int)])a0; - - Map value is tuple
      b = stream a through `cat` as (m:map[

      {(i:int,j:chararray)}

      ]); - - Map value is bag

      MapLookup a typed map will result datatype of map value.
      a = load '1.txt' as(map[int]);
      b = foreach a generate $0#'key';

      Schema for b:
      b:

      {int}

      The behavior of untyped map will remain the same.

      1. PIG-1876_3.patch
        40 kB
        Richard Ding
      2. PIG-1876-2.patch
        39 kB
        Daniel Dai
      3. PIG-1876-1.patch
        39 kB
        Daniel Dai

        Issue Links

          Activity

          Daniel Dai created issue -
          Daniel Dai made changes -
          Field Original Value New Value
          Description Currently Pig map type is untyped, which means map value is always of bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a shuffle key, which somewhat relieve the problem. However, typed map is still beneficial in that:

          1. User can make semantic use of the map value type. Currently, user need to explicitly cast map value, which is ugly
          2. Though PIG-1277 allow unknown type be a shuffle key, the performance suffers. We don't have a raw comparator for the unknown type, instead, we need to instantiate the value object and invoke its comparator

          Here is proposed syntax for typed map:
          map[type]

          Typed map can be used in place of untyped map could occur. For example:
          a = load '1.txt' as(map[int]);
          b = foreach a generate (map[(i:int)])a0; - - Map value is tuple
          b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]); - - Map value is bag

          MapLookup a typed map will result datatype of map value.
          a = load '1.txt' as(map[int]);
          b = foreach a generate $0#'key';

          Schema for b:
          b: {chararray}

          The behavior of untyped map will remain the same.
          Currently Pig map type is untyped, which means map value is always of bytearray(ie. unknown) type. In PIG-1277, we allow unknown type to be a shuffle key, which somewhat relieve the problem. However, typed map is still beneficial in that:

          1. User can make semantic use of the map value type. Currently, user need to explicitly cast map value, which is ugly
          2. Though PIG-1277 allow unknown type be a shuffle key, the performance suffers. We don't have a raw comparator for the unknown type, instead, we need to instantiate the value object and invoke its comparator

          Here is proposed syntax for typed map:
          map[type]

          Typed map can be used in place of untyped map could occur. For example:
          a = load '1.txt' as(map[int]);
          b = foreach a generate (map[(i:int)])a0; - - Map value is tuple
          b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]); - - Map value is bag

          MapLookup a typed map will result datatype of map value.
          a = load '1.txt' as(map[int]);
          b = foreach a generate $0#'key';

          Schema for b:
          b: {int}

          The behavior of untyped map will remain the same.
          Daniel Dai made changes -
          Attachment PIG-1876-1.patch [ 12472842 ]
          Daniel Dai made changes -
          Attachment PIG-1876-2.patch [ 12473170 ]
          Richard Ding made changes -
          Attachment PIG-1876_3.patch [ 12473209 ]
          Daniel Dai made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Release Note User can specify the a typed map in place of untyped map using the syntax:
          map[type]

          Untyped map still works as before.
          Olga Natkovich made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Cheolsoo Park made changes -
          Link This issue relates to PIG-3485 [ PIG-3485 ]

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development