Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2600

Better Map support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.11
    • None
    • None
    • Patch Available
    • Hide
      Pig 0.11+ includes the following UDFs for operating with Map

      1. VALUESET
      2. VALUELIST
      3. KEYSET
      4. INVERSEMAP

      VALUESET

        This UDF takes a Map and returns a Tuple containing the value set.
        Note, this UDF returns only unique values. For all values, use
        VALUELIST instead.

        <code>
        grunt> cat data
        [open#apache,1#2,11#2]
        [apache#hadoop,3#4,12#hadoop]
       
        grunt> a = load 'data' as (M:[]);
        grunt> b = foreach a generate VALUELIST($0);
        ({(apache),(2)})
        ({(4),(hadoop)})
       
        </code>

      VALUELIST

       
        This UDF takes a Map and returns a Bag containing the values from map.
        Note that output tuple contains all values, not just unique ones.
        For obtaining unique values from map, use VALUESET instead.
       
        <code>
        grunt> cat data
        [open#apache,1#2,11#2]
        [apache#hadoop,3#4,12#hadoop]
       
        grunt> a = load 'data' as (M:[]);
        grunt> b = foreach a generate VALUELIST($0);
        grunt> dump b;
        ({(apache),(2),(2)})
        ({(4),(hadoop),(hadoop)})
        </code>

      KEYSET

        This UDF takes a Map and returns a Bag containing the keyset.

        <code>
        grunt> cat data
        [open#apache,1#2,11#2]
        [apache#hadoop,3#4,12#hadoop]
       
        grunt> a = load 'data' as (M:[]);
        grunt> b = foreach a generate KEYSET($0);
        grunt> dump b;
        ({(open),(1),(11)})
        ({(3),(apache),(12)})
        </code>

      INVERSEMAP

        This UDF accepts a Map as input with values of any primitive data type.
        UDF swaps keys with values and returns the new inverse Map.
        Note in case original values are non-unique, the resulting Map would
        contain String Key -> DataBag of values. Here the bag of values is composed
        of the original keys having the same value.
       
        Note: 1. UDF accepts Map with Values of primitive data type
                 2. UDF returns Map<String,DataBag>
        <code>
        grunt> cat 1data
        [open#1,1#2,11#2]
        [apache#2,3#4,12#24]
       
        
        grunt> a = load 'data' as (M:[int]);
        grunt> b = foreach a generate INVERSEMAP($0);
       
        grunt> dump b;
        ([2#{(1),(11)},apache#{(open)}])
        ([hadoop#{(apache),(12)},4#{(3)}])
        </code>
      Show
      Pig 0.11+ includes the following UDFs for operating with Map 1. VALUESET 2. VALUELIST 3. KEYSET 4. INVERSEMAP VALUESET   This UDF takes a Map and returns a Tuple containing the value set.   Note, this UDF returns only unique values. For all values, use   VALUELIST instead.   <code>   grunt> cat data   [open#apache,1#2,11#2]   [apache#hadoop,3#4,12#hadoop]     grunt> a = load 'data' as (M:[]);   grunt> b = foreach a generate VALUELIST($0);   ({(apache),(2)})   ({(4),(hadoop)})     </code> VALUELIST     This UDF takes a Map and returns a Bag containing the values from map.   Note that output tuple contains all values, not just unique ones.   For obtaining unique values from map, use VALUESET instead.     <code>   grunt> cat data   [open#apache,1#2,11#2]   [apache#hadoop,3#4,12#hadoop]     grunt> a = load 'data' as (M:[]);   grunt> b = foreach a generate VALUELIST($0);   grunt> dump b;   ({(apache),(2),(2)})   ({(4),(hadoop),(hadoop)})   </code> KEYSET   This UDF takes a Map and returns a Bag containing the keyset.   <code>   grunt> cat data   [open#apache,1#2,11#2]   [apache#hadoop,3#4,12#hadoop]     grunt> a = load 'data' as (M:[]);   grunt> b = foreach a generate KEYSET($0);   grunt> dump b;   ({(open),(1),(11)})   ({(3),(apache),(12)})   </code> INVERSEMAP   This UDF accepts a Map as input with values of any primitive data type.   UDF swaps keys with values and returns the new inverse Map.   Note in case original values are non-unique, the resulting Map would   contain String Key -> DataBag of values. Here the bag of values is composed   of the original keys having the same value.     Note: 1. UDF accepts Map with Values of primitive data type            2. UDF returns Map<String,DataBag>   <code>   grunt> cat 1data   [open#1,1#2,11#2]   [apache#2,3#4,12#24]        grunt> a = load 'data' as (M:[int]);   grunt> b = foreach a generate INVERSEMAP($0);     grunt> dump b;   ([2#{(1),(11)},apache#{(open)}])   ([hadoop#{(apache),(12)},4#{(3)}])   </code>
    • udf

    Description

      It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.

      • TOBAG should take a Map and output {(key, value)}
      • TOMAP should take a Bag in that same form and make a map.
      • KEYSET should return the set of keys.
      • VALUESET should return the set of values.
      • VALUELIST should return the List of values (no deduping).
      • INVERSEMAP would return a Map of values => the set of keys that refer to that Key

      This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.

      I'd love to hear other things that would be useful for people!

      Attachments

        1. PIG-2600.patch
          8 kB
          Prashant Kommireddi
        2. PIG-2600_2.patch
          16 kB
          Prashant Kommireddi
        3. PIG-2600_3.patch
          17 kB
          Prashant Kommireddi
        4. PIG-2600_4.patch
          22 kB
          Prashant Kommireddi
        5. PIG-2600_5.patch
          23 kB
          Prashant Kommireddi
        6. PIG-2600_6.patch
          23 kB
          Prashant Kommireddi
        7. PIG-2600_7.patch
          23 kB
          Prashant Kommireddi
        8. PIG-2600_8.patch
          23 kB
          Prashant Kommireddi
        9. PIG-2600_9.patch
          22 kB
          Prashant Kommireddi

        Activity

          People

            prkommireddi Prashant Kommireddi
            jcoveney Jonathan Coveney
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: