Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4899

The number of records of input file is calculated wrongly in spark mode in multiquery case

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:
      None

      Description

      sparkCounter to calucate the records of input file(LoadConverter#ToTupleFunction#apply) will be executed multiple times in multiquery case. This will cause the input records number is calculated wrongly. for example:

      #--------------------------------------------------
      # Spark Plan                                  
      #--------------------------------------------------
      
      Spark node scope-534
      Split - scope-548
      |   |
      |   Store(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-538
      |   |
      |   |---C: Filter[bag] - scope-495
      |       |   |
      |       |   Less Than or Equal[boolean] - scope-498
      |       |   |
      |       |   |---Project[int][1] - scope-496
      |       |   |
      |       |   |---Constant(5) - scope-497
      |   |
      |   Store(hdfs://localhost:48350/tmp/temp649016960/tmp804709981:org.apache.pig.impl.io.InterStorage) - scope-546
      |   |
      |   |---B: Filter[bag] - scope-507
      |       |   |
      |       |   Equal To[boolean] - scope-510
      |       |   |
      |       |   |---Project[int][0] - scope-508
      |       |   |
      |       |   |---Constant(3) - scope-509
      |
      |---A: New For Each(false,false,false)[bag] - scope-491
          |   |
          |   Cast[int] - scope-483
          |   |
          |   |---Project[bytearray][0] - scope-482
          |   |
          |   Cast[int] - scope-486
          |   |
          |   |---Project[bytearray][1] - scope-485
          |   |
          |   Cast[int] - scope-489
          |   |
          |   |---Project[bytearray][2] - scope-488
          |
          |---A: Load(hdfs://localhost:48350/user/root/input:org.apache.pig.builtin.PigStorage) - scope-481--------
      
      Spark node scope-540
      C: Store(hdfs://localhost:48350/user/root/output:org.apache.pig.builtin.PigStorage) - scope-502
      |
      |---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-539--------
      
      Spark node scope-542
      D: Store(hdfs://localhost:48350/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-533
      |
      |---D: FRJoin[tuple] - scope-525
          |   |
          |   Project[int][0] - scope-522
          |   |
          |   Project[int][0] - scope-523
          |   |
          |   Project[int][0] - scope-524
          |
          |---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-541--------
      
      Spark node scope-545
      Store(hdfs://localhost:48350/tmp/temp649016960/tmp-2036144538:org.apache.pig.impl.io.InterStorage) - scope-547
      |
      |---A1: New For Each(false,false,false)[bag] - scope-521
          |   |
          |   Cast[int] - scope-513
          |   |
          |   |---Project[bytearray][0] - scope-512
          |   |
          |   Cast[int] - scope-516
          |   |
          |   |---Project[bytearray][1] - scope-515
          |   |
          |   Cast[int] - scope-519
          |   |
          |   |---Project[bytearray][2] - scope-518
          |
          |---A1: Load(hdfs://localhost:48350/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-511-------
      

      PhysicalOperator (LoadA) will be executed in LoadConverter#ToTupleFunction#apply for more than the correct times because this is a multi-query case.

        Attachments

        1. PIG-4899.2.patch
          4 kB
          Ádám Szita
        2. PIG-4899.3IncrFrom2.patch
          5 kB
          Ádám Szita
        3. PIG-4899.patch
          3 kB
          Ádám Szita

          Activity

            People

            • Assignee:
              szita Ádám Szita
              Reporter:
              kellyzly liyunzhang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: