Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-1003

Need a input that merges multiple ShuffleMergedInput from VertexGroup

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • None
    • None

    Description

      In PIG-3835, was trying to do use vertex groups for unions. Union followed by store works fine. But when trying to do groupby,

      A = LOAD '/tmp/data' AS (f1:int,f2:int);
      B = LOAD '/tmp/data2' AS (f1:int,f2:int);
      C = UNION onschema A,B;
      D = GROUP C by f1;
      E = FOREACH D GENERATE group, SUM(C.f2);
      store E into '/tmp/tezout' using PigStorage();
      

      ConcatenatedMergedKeyValuesInput on the reduce, had only grouped records within each input and not across all inputs.

      i.e If A had records
      a 1
      b 1
      b 2
      and B
      a 2
      a 3
      b 3

      The records from ConcatenatedMergedKeyValuesInput of A and B were
      a

      {1}

      , b

      {1,2}

      , a

      {2,3}

      , b

      {3}

      while I am expecting a

      {1,2,3}, b {1,2,3}

      Attachments

        1. TEZ-1003-1.patch
          8 kB
          Rohini Palaniswamy
        2. TEZ-1003-2.patch
          9 kB
          Rohini Palaniswamy
        3. TEZ-1003.3.txt
          9 kB
          Siddharth Seth
        4. TEZ-1003.4.txt
          9 kB
          Siddharth Seth

        Issue Links

          Activity

            People

              rohini Rohini Palaniswamy
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: