Uploaded image for project: 'DataFu'
  1. DataFu
  2. DATAFU-38

BagGroup merges rows

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • None

    Description

      load

      1,a,A,1
      1,b,A,2
      1,a,B,3
      2,c,C,4
      2,b,B,5
      2,b,C,6
      

      using tmp_datafu = load 'test' using PigStorage(',') as (id:chararray, domain:chararray, keyword:chararray, weight:int);
      and do

      tmp_roll = foreach (group tmp_datafu by id) generate
        group as id,
        CountEach(tmp_datafu.domain) as domains,
        BagGroup(tmp_datafu.(keyword,weight),tmp_datafu.keyword) as keywords;
      

      the result is

      (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})})
      (2,{(c,1),(b,2)},{(B,{(B,3),(B,5)}),(A,{(A,1),(A,2)}),(C,{(C,4),(C,6)})})
      

      instead of

      (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})})
      (2,{(c,1),(b,2)},{(B,{(B,5)}),(C,{(C,4),(C,6)})})
      

      see also
      http://stackoverflow.com/questions/22945236/how-do-i-accumulate-vectors-into-a-map

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sds Sam Steingold
            sds Sam Steingold
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment