Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
in TestPigRunner#simpleMultiQueryTest3 ,
the explain plan
#-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-53 Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-54 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://localhost:58892/user/root/input:org.apache.pig.builtin.PigStorage) - scope-0-------- Spark node scope-55 Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-56 | |---C: Filter[bag] - scope-14 | | | Less Than or Equal[boolean] - scope-17 | | | |---Project[int][1] - scope-15 | | | |---Constant(5) - scope-16 | |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-10-------- Spark node scope-57 C: Store(hdfs://localhost:58892/user/root/output:org.apache.pig.builtin.PigStorage) - scope-21 | |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-14-------- Spark node scope-65 D: Store(hdfs://localhost:58892/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-52 | |---D: FRJoinSpark[tuple] - scope-44 | | | Project[int][0] - scope-41 | | | Project[int][0] - scope-42 | | | Project[int][0] - scope-43 | |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) - scope-58 | |---BroadcastSpark - scope-63 | | | |---B: Filter[bag] - scope-26 | | | | | Equal To[boolean] - scope-29 | | | | | |---Project[int][0] - scope-27 | | | | | |---Constant(3) - scope-28 | | | |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) - scope-60 | |---BroadcastSpark - scope-64 | |---A1: New For Each(false,false,false)[bag] - scope-40 | | | Cast[int] - scope-32 | | | |---Project[bytearray][0] - scope-31 | | | Cast[int] - scope-35 | | | |---Project[bytearray][1] - scope-34 | | | Cast[int] - scope-38 | | | |---Project[bytearray][2] - scope-37 | |---A1: Load(hdfs://localhost:58892/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-30--------
assertEquals(30, inputStats.get(0).getBytes()) is correct in spark mode,
assertEquals(18, inputStats.get(1).getBytes()) is wrong in spark mode as the there are 3 loads in Spark node scope-65. stats.get("BytesRead") returns 49( guess this is the sum of
three loads(input2,tmp1818797386,tmp-546700946). But current bytesRead is -1 because singleInput is false.