Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-5240

Fix TestPigRunner#simpleMultiQueryTest3 in spark mode for wrong inputStats

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:
      None

      Description

      in TestPigRunner#simpleMultiQueryTest3 ,
      the explain plan

      #--------------------------------------------------
      # Spark Plan                                  
      #--------------------------------------------------
      
      Spark node scope-53
      Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-54
      |
      |---A: New For Each(false,false,false)[bag] - scope-10
          |   |
          |   Cast[int] - scope-2
          |   |
          |   |---Project[bytearray][0] - scope-1
          |   |
          |   Cast[int] - scope-5
          |   |
          |   |---Project[bytearray][1] - scope-4
          |   |
          |   Cast[int] - scope-8
          |   |
          |   |---Project[bytearray][2] - scope-7
          |
          |---A: Load(hdfs://localhost:58892/user/root/input:org.apache.pig.builtin.PigStorage) - scope-0--------
      
      Spark node scope-55
      Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-56
      |
      |---C: Filter[bag] - scope-14
          |   |
          |   Less Than or Equal[boolean] - scope-17
          |   |
          |   |---Project[int][1] - scope-15
          |   |
          |   |---Constant(5) - scope-16
          |
          |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-10--------
      
      Spark node scope-57
      C: Store(hdfs://localhost:58892/user/root/output:org.apache.pig.builtin.PigStorage) - scope-21
      |
      |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-14--------
      
      Spark node scope-65
      D: Store(hdfs://localhost:58892/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-52
      |
      |---D: FRJoinSpark[tuple] - scope-44
          |   |
          |   Project[int][0] - scope-41
          |   |
          |   Project[int][0] - scope-42
          |   |
          |   Project[int][0] - scope-43
          |
          |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-58
          |
          |---BroadcastSpark - scope-63
          |   |
          |   |---B: Filter[bag] - scope-26
          |       |   |
          |       |   Equal To[boolean] - scope-29
          |       |   |
          |       |   |---Project[int][0] - scope-27
          |       |   |
          |       |   |---Constant(3) - scope-28
          |       |
          |       |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-60
          |
          |---BroadcastSpark - scope-64
              |
              |---A1: New For Each(false,false,false)[bag] - scope-40
                  |   |
                  |   Cast[int] - scope-32
                  |   |
                  |   |---Project[bytearray][0] - scope-31
                  |   |
                  |   Cast[int] - scope-35
                  |   |
                  |   |---Project[bytearray][1] - scope-34
                  |   |
                  |   Cast[int] - scope-38
                  |   |
                  |   |---Project[bytearray][2] - scope-37
                  |
                  |---A1: Load(hdfs://localhost:58892/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-30--------
      

      assertEquals(30, inputStats.get(0).getBytes()) is correct in spark mode,
      assertEquals(18, inputStats.get(1).getBytes()) is wrong in spark mode as the there are 3 loads in Spark node scope-65. stats.get("BytesRead") returns 49( guess this is the sum of
      three loads(input2,tmp1818797386,tmp-546700946). But current bytesRead is -1 because singleInput is false.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kellyzly liyunzhang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: