Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
sparkCounter to calucate the records of input file(LoadConverter#ToTupleFunction#apply) will be executed multiple times in multiquery case. This will cause the input records number is calculated wrongly. for example:
#-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-534 Split - scope-548 | | | Store(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-538 | | | |---C: Filter[bag] - scope-495 | | | | | Less Than or Equal[boolean] - scope-498 | | | | | |---Project[int][1] - scope-496 | | | | | |---Constant(5) - scope-497 | | | Store(hdfs://localhost:48350/tmp/temp649016960/tmp804709981:org.apache.pig.impl.io.InterStorage) - scope-546 | | | |---B: Filter[bag] - scope-507 | | | | | Equal To[boolean] - scope-510 | | | | | |---Project[int][0] - scope-508 | | | | | |---Constant(3) - scope-509 | |---A: New For Each(false,false,false)[bag] - scope-491 | | | Cast[int] - scope-483 | | | |---Project[bytearray][0] - scope-482 | | | Cast[int] - scope-486 | | | |---Project[bytearray][1] - scope-485 | | | Cast[int] - scope-489 | | | |---Project[bytearray][2] - scope-488 | |---A: Load(hdfs://localhost:48350/user/root/input:org.apache.pig.builtin.PigStorage) - scope-481-------- Spark node scope-540 C: Store(hdfs://localhost:48350/user/root/output:org.apache.pig.builtin.PigStorage) - scope-502 | |---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-539-------- Spark node scope-542 D: Store(hdfs://localhost:48350/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-533 | |---D: FRJoin[tuple] - scope-525 | | | Project[int][0] - scope-522 | | | Project[int][0] - scope-523 | | | Project[int][0] - scope-524 | |---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-541-------- Spark node scope-545 Store(hdfs://localhost:48350/tmp/temp649016960/tmp-2036144538:org.apache.pig.impl.io.InterStorage) - scope-547 | |---A1: New For Each(false,false,false)[bag] - scope-521 | | | Cast[int] - scope-513 | | | |---Project[bytearray][0] - scope-512 | | | Cast[int] - scope-516 | | | |---Project[bytearray][1] - scope-515 | | | Cast[int] - scope-519 | | | |---Project[bytearray][2] - scope-518 | |---A1: Load(hdfs://localhost:48350/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-511-------
PhysicalOperator (LoadA) will be executed in LoadConverter#ToTupleFunction#apply for more than the correct times because this is a multi-query case.