Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Release Note:
      Hide
      Pig 0.11+ includes the following UDFs for operating with Map

      1. VALUESET
      2. VALUELIST
      3. KEYSET
      4. INVERSEMAP

      VALUESET

        This UDF takes a Map and returns a Tuple containing the value set.
        Note, this UDF returns only unique values. For all values, use
        VALUELIST instead.

        <code>
        grunt> cat data
        [open#apache,1#2,11#2]
        [apache#hadoop,3#4,12#hadoop]
       
        grunt> a = load 'data' as (M:[]);
        grunt> b = foreach a generate VALUELIST($0);
        ({(apache),(2)})
        ({(4),(hadoop)})
       
        </code>

      VALUELIST

       
        This UDF takes a Map and returns a Bag containing the values from map.
        Note that output tuple contains all values, not just unique ones.
        For obtaining unique values from map, use VALUESET instead.
       
        <code>
        grunt> cat data
        [open#apache,1#2,11#2]
        [apache#hadoop,3#4,12#hadoop]
       
        grunt> a = load 'data' as (M:[]);
        grunt> b = foreach a generate VALUELIST($0);
        grunt> dump b;
        ({(apache),(2),(2)})
        ({(4),(hadoop),(hadoop)})
        </code>

      KEYSET

        This UDF takes a Map and returns a Bag containing the keyset.

        <code>
        grunt> cat data
        [open#apache,1#2,11#2]
        [apache#hadoop,3#4,12#hadoop]
       
        grunt> a = load 'data' as (M:[]);
        grunt> b = foreach a generate KEYSET($0);
        grunt> dump b;
        ({(open),(1),(11)})
        ({(3),(apache),(12)})
        </code>

      INVERSEMAP

        This UDF accepts a Map as input with values of any primitive data type.
        UDF swaps keys with values and returns the new inverse Map.
        Note in case original values are non-unique, the resulting Map would
        contain String Key -> DataBag of values. Here the bag of values is composed
        of the original keys having the same value.
       
        Note: 1. UDF accepts Map with Values of primitive data type
                 2. UDF returns Map<String,DataBag>
        <code>
        grunt> cat 1data
        [open#1,1#2,11#2]
        [apache#2,3#4,12#24]
       
        
        grunt> a = load 'data' as (M:[int]);
        grunt> b = foreach a generate INVERSEMAP($0);
       
        grunt> dump b;
        ([2#{(1),(11)},apache#{(open)}])
        ([hadoop#{(apache),(12)},4#{(3)}])
        </code>
      Show
      Pig 0.11+ includes the following UDFs for operating with Map 1. VALUESET 2. VALUELIST 3. KEYSET 4. INVERSEMAP VALUESET   This UDF takes a Map and returns a Tuple containing the value set.   Note, this UDF returns only unique values. For all values, use   VALUELIST instead.   <code>   grunt> cat data   [open#apache,1#2,11#2]   [apache#hadoop,3#4,12#hadoop]     grunt> a = load 'data' as (M:[]);   grunt> b = foreach a generate VALUELIST($0);   ({(apache),(2)})   ({(4),(hadoop)})     </code> VALUELIST     This UDF takes a Map and returns a Bag containing the values from map.   Note that output tuple contains all values, not just unique ones.   For obtaining unique values from map, use VALUESET instead.     <code>   grunt> cat data   [open#apache,1#2,11#2]   [apache#hadoop,3#4,12#hadoop]     grunt> a = load 'data' as (M:[]);   grunt> b = foreach a generate VALUELIST($0);   grunt> dump b;   ({(apache),(2),(2)})   ({(4),(hadoop),(hadoop)})   </code> KEYSET   This UDF takes a Map and returns a Bag containing the keyset.   <code>   grunt> cat data   [open#apache,1#2,11#2]   [apache#hadoop,3#4,12#hadoop]     grunt> a = load 'data' as (M:[]);   grunt> b = foreach a generate KEYSET($0);   grunt> dump b;   ({(open),(1),(11)})   ({(3),(apache),(12)})   </code> INVERSEMAP   This UDF accepts a Map as input with values of any primitive data type.   UDF swaps keys with values and returns the new inverse Map.   Note in case original values are non-unique, the resulting Map would   contain String Key -> DataBag of values. Here the bag of values is composed   of the original keys having the same value.     Note: 1. UDF accepts Map with Values of primitive data type            2. UDF returns Map<String,DataBag>   <code>   grunt> cat 1data   [open#1,1#2,11#2]   [apache#2,3#4,12#24]        grunt> a = load 'data' as (M:[int]);   grunt> b = foreach a generate INVERSEMAP($0);     grunt> dump b;   ([2#{(1),(11)},apache#{(open)}])   ([hadoop#{(apache),(12)},4#{(3)}])   </code>
    • Tags:
      udf

      Description

      It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.

      • TOBAG should take a Map and output {(key, value)}
      • TOMAP should take a Bag in that same form and make a map.
      • KEYSET should return the set of keys.
      • VALUESET should return the set of values.
      • VALUELIST should return the List of values (no deduping).
      • INVERSEMAP would return a Map of values => the set of keys that refer to that Key

      This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.

      I'd love to hear other things that would be useful for people!

      1. PIG-2600.patch
        8 kB
        Prashant Kommireddi
      2. PIG-2600_9.patch
        22 kB
        Prashant Kommireddi
      3. PIG-2600_8.patch
        23 kB
        Prashant Kommireddi
      4. PIG-2600_7.patch
        23 kB
        Prashant Kommireddi
      5. PIG-2600_6.patch
        23 kB
        Prashant Kommireddi
      6. PIG-2600_5.patch
        23 kB
        Prashant Kommireddi
      7. PIG-2600_4.patch
        22 kB
        Prashant Kommireddi
      8. PIG-2600_3.patch
        17 kB
        Prashant Kommireddi
      9. PIG-2600_2.patch
        16 kB
        Prashant Kommireddi

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Jonathan Coveney
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development