Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4522

Remove unnecessary store and load when POSplit is encounted

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • spark-branch
    • spark
    • None

    Description

      pig script:

      A = load './testSplit.txt' as (f1:int, f2:int,f3:int);
      split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6);
      store x into './testSplit_x.out';
      store y into './testSplit_y.out';
      store z into './testSplit_z.out';
      explain x; 
      explain y;
      explain z;
      

      spark plan:

      #The Spark node relations are:
      #-----------------------------------------------------#
      scope-17->scope-20 
      scope-20
      #--------------------------------------------------
      # Spark Plan                                  
      #--------------------------------------------------
      
      Spark node scope-17
      Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage) - scope-18
      |
      |---A: New For Each(false,false,false)[bag] - scope-10
          |   |
          |   Cast[int] - scope-2
          |   |
          |   |---Project[bytearray][0] - scope-1
          |   |
          |   Cast[int] - scope-5
          |   |
          |   |---Project[bytearray][1] - scope-4
          |   |
          |   Cast[int] - scope-8
          |   |
          |   |---Project[bytearray][2] - scope-7
          |
          |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0--------
      
      Spark node scope-20
      x: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-16
      |
      |---x: Filter[bag] - scope-12
          |   |
          |   Less Than[boolean] - scope-15
          |   |
          |   |---Project[int][0] - scope-13
          |   |
          |   |---Constant(7) - scope-14
          |
          |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage) - scope-19--------
      
      #-----------------------------------------------------#
      #The Spark node relations are:
      #-----------------------------------------------------#
      scope-38->scope-41 
      scope-41
      #--------------------------------------------------
      # Spark Plan                                  
      #--------------------------------------------------
      
      Spark node scope-38
      Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage) - scope-39
      |
      |---A: New For Each(false,false,false)[bag] - scope-31
          |   |
          |   Cast[int] - scope-23
          |   |
          |   |---Project[bytearray][0] - scope-22
          |   |
          |   Cast[int] - scope-26
          |   |
          |   |---Project[bytearray][1] - scope-25
          |   |
          |   Cast[int] - scope-29
          |   |
          |   |---Project[bytearray][2] - scope-28
          |
          |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-21--------
      
      Spark node scope-41
      y: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-37
      |
      |---y: Filter[bag] - scope-33
          |   |
          |   Equal To[boolean] - scope-36
          |   |
          |   |---Project[int][1] - scope-34
          |   |
          |   |---Constant(5) - scope-35
          |
          |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage) - scope-40--------
      
      #-----------------------------------------------------#
      #The Spark node relations are:
      #-----------------------------------------------------#
      scope-63->scope-66 
      scope-66
      #--------------------------------------------------
      # Spark Plan                                  
      #--------------------------------------------------
      
      Spark node scope-63
      Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage) - scope-64
      |
      |---A: New For Each(false,false,false)[bag] - scope-52
          |   |
          |   Cast[int] - scope-44
          |   |
          |   |---Project[bytearray][0] - scope-43
          |   |
          |   Cast[int] - scope-47
          |   |
          |   |---Project[bytearray][1] - scope-46
          |   |
          |   Cast[int] - scope-50
          |   |
          |   |---Project[bytearray][2] - scope-49
          |
          |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-42--------
      
      Spark node scope-66
      z: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-62
      |
      |---z: Filter[bag] - scope-54
          |   |
          |   Or[boolean] - scope-61
          |   |
          |   |---Less Than[boolean] - scope-57
          |   |   |
          |   |   |---Project[int][2] - scope-55
          |   |   |
          |   |   |---Constant(6) - scope-56
          |   |
          |   |---Greater Than[boolean] - scope-60
          |       |
          |       |---Project[int][2] - scope-58
          |       |
          |       |---Constant(6) - scope-59
          |
          |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage) - scope-65--------
      

      Scope-18(Store) and Scope-19(Load) is not necessary. It should be removed.

      Scope-39(Store) and Scope-40(Load) is not necessary. It should be removed.

      Scope-64(Store) and Scope-65(Load) is not necessary. It should be removed.

      Attachments

        1. PIG-4522.patch
          1 kB
          liyunzhang

        Activity

          People

            kellyzly liyunzhang
            kellyzly liyunzhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: