Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes – I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend.

      Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch.

      1. execengine.patch
        62 kB
        Achal Soni
      2. mapreduce_execengine.patch
        130 kB
        Achal Soni
      3. stats_scriptstate.patch
        157 kB
        Achal Soni
      4. test_failures.txt
        16 kB
        Cheolsoo Park
      5. test_suite.patch
        77 kB
        Achal Soni
      6. updated-8-22-2013-exec-engine.patch
        364 kB
        Achal Soni
      7. updated-8-23-2013-exec-engine.patch
        394 kB
        Cheolsoo Park
      8. updated-8-27-2013-exec-engine.patch
        397 kB
        Cheolsoo Park
      9. updated-8-28-2013-exec-engine.patch
        398 kB
        Cheolsoo Park
      10. updated-8-29-2013-exec-engine.patch
        398 kB
        Cheolsoo Park

        Issue Links

          Activity

          Hide
          Achal Soni added a comment -

          Here is the current patch as is, with the changes to /src and /test. Please let me know if there are any questions or other requests. I expect these changes will incite some good discussion about how to achieve a pluggable execution engine!

          Show
          Achal Soni added a comment - Here is the current patch as is, with the changes to /src and /test. Please let me know if there are any questions or other requests. I expect these changes will incite some good discussion about how to achieve a pluggable execution engine!
          Hide
          Dmitriy V. Ryaboy added a comment -

          Hi Achal,
          That's a large patch.
          Can you give us a roadmap for reading it – what are the changes, at a high level? It looks like you had to change a bunch of stuff that's not (at first glance) directly related to exec mode.

          Procedurally:

          • please generate the patch using 'git diff -no-prefix' since the apache pig master is on svn
          • please post the complete patch to Review Board, for ease of commenting
          • please make sure that all new files have the apache license headers at the top

          Thanks
          -D

          Show
          Dmitriy V. Ryaboy added a comment - Hi Achal, That's a large patch. Can you give us a roadmap for reading it – what are the changes, at a high level? It looks like you had to change a bunch of stuff that's not (at first glance) directly related to exec mode. Procedurally: please generate the patch using 'git diff -no-prefix' since the apache pig master is on svn please post the complete patch to Review Board, for ease of commenting please make sure that all new files have the apache license headers at the top Thanks -D
          Hide
          Dmitriy V. Ryaboy added a comment -

          oh 3 more things
          I thought you found your way around the -y argument? I still see that in there.
          Don't comment out blocks of code, just delete them
          Add some documentation about creating new Exec Engines to the xml-based docs, or at least post it here. Just having it in javadocs is not sufficient.

          Show
          Dmitriy V. Ryaboy added a comment - oh 3 more things I thought you found your way around the -y argument? I still see that in there. Don't comment out blocks of code, just delete them Add some documentation about creating new Exec Engines to the xml-based docs, or at least post it here. Just having it in javadocs is not sufficient.
          Hide
          Julien Le Dem added a comment -

          Hi Achal
          for large patches, please create a review here: https://reviews.apache.org

          Show
          Julien Le Dem added a comment - Hi Achal for large patches, please create a review here: https://reviews.apache.org
          Hide
          Achal Soni added a comment -

          Sorry my bad! Was trying to get it out as soon as possible that I brushed over some stuff too fast.

          Dmitriy V. Ryaboy I will make these changes later today and post them up. I did actually get around the -y argument, just totally forgot to go back and get rid of that. In the meantime, the Review Board is located here for the current patch:

          https://reviews.apache.org/r/13541/

          Once I update the patch later today I will post it here with the ReviewBoards as well.

          • Achal
          Show
          Achal Soni added a comment - Sorry my bad! Was trying to get it out as soon as possible that I brushed over some stuff too fast. Dmitriy V. Ryaboy I will make these changes later today and post them up. I did actually get around the -y argument, just totally forgot to go back and get rid of that. In the meantime, the Review Board is located here for the current patch: https://reviews.apache.org/r/13541/ Once I update the patch later today I will post it here with the ReviewBoards as well. Achal
          Hide
          Achal Soni added a comment -

          I have regenerated the patches taking into account some of the suggestions. I think this should cover most of the cases, although I may have missed certain things. I wanted to post these versions of the patch as early as I could so that people can still consume most of the changes while I continue to review the different pieces.

          I have separated the patches into 4 main areas – the major execution engine changes, the changes to the MR codebase to make it happen, the changes to further abstract ScriptState/other statistics related stuff, and the testing changes. Hopefully this will make things clearer and easier to consume.

          Show
          Achal Soni added a comment - I have regenerated the patches taking into account some of the suggestions. I think this should cover most of the cases, although I may have missed certain things. I wanted to post these versions of the patch as early as I could so that people can still consume most of the changes while I continue to review the different pieces. I have separated the patches into 4 main areas – the major execution engine changes, the changes to the MR codebase to make it happen, the changes to further abstract ScriptState/other statistics related stuff, and the testing changes. Hopefully this will make things clearer and easier to consume.
          Hide
          Achal Soni added a comment -

          And here are the reviewboards for each one.

          Main ExecEngine changes:
          http://reviews.apache.org/r/13575/

          MR ExecEngine changes:
          http://reviews.apache.org/r/13576/

          ScriptState/Statistics changes:
          http://reviews.apache.org/r/13577/

          Testing Suite changes:
          http://reviews.apache.org/r/13579/

          • Achal
          Show
          Achal Soni added a comment - And here are the reviewboards for each one. Main ExecEngine changes: http://reviews.apache.org/r/13575/ MR ExecEngine changes: http://reviews.apache.org/r/13576/ ScriptState/Statistics changes: http://reviews.apache.org/r/13577/ Testing Suite changes: http://reviews.apache.org/r/13579/ Achal
          Hide
          Cheolsoo Park added a comment -

          Achal Soni, thank you very much for the great work! Although I am not a Pig old-timer, I have been playing with your patch and have a few high-level comments as follows:

          1. I love your ExecType interface. What I don't like is that there are two things called ExecType:
            org.apache.pig.ExecType
            org.apache.pig.backend.executionengine.ExecType
            

            Since you're introducing the ServiceLoader framework, the enum seems no longer needed at all. Furthermore, eliminating it helps simplify the constructor code of PigContext and PigServer.

          2. Currently, it is a bit hard to review your patch because there are too many changes. To get it committed faster, I suggest we should avoid unnecessary changes and minimize its scope. For example, having the PigServer constructor signature unchanged helps avoid a lot of changes in Test*.java files.
          3. Julien already pointed out this in the RB, but your patch accidentally reverts a couple of previous commits. I took them out.

          I went ahead and made these changes myself - here. If everyone thinks this is a step forward, I will upload it in a new patch. Please let me know.

          Show
          Cheolsoo Park added a comment - Achal Soni , thank you very much for the great work! Although I am not a Pig old-timer, I have been playing with your patch and have a few high-level comments as follows: I love your ExecType interface. What I don't like is that there are two things called ExecType: org.apache.pig.ExecType org.apache.pig.backend.executionengine.ExecType Since you're introducing the ServiceLoader framework, the enum seems no longer needed at all. Furthermore, eliminating it helps simplify the constructor code of PigContext and PigServer. Currently, it is a bit hard to review your patch because there are too many changes. To get it committed faster, I suggest we should avoid unnecessary changes and minimize its scope. For example, having the PigServer constructor signature unchanged helps avoid a lot of changes in Test*.java files. Julien already pointed out this in the RB, but your patch accidentally reverts a couple of previous commits. I took them out. I went ahead and made these changes myself - here . If everyone thinks this is a step forward, I will upload it in a new patch. Please let me know.
          Hide
          Cheolsoo Park added a comment -

          I have two more comments in ExecutionEngine and MRExecutionEngine as follows:

          1. Can you simplify the checked exceptions in the ExecutionEngine interface? For example,
            From:
            public PigStats launchPig(LogicalPlan lp, String grpName, PigContext pc)
                throws PlanException, VisitorException, IOException, ExecException,
                JobCreationException, FrontendException, Exception;
            

            To:

            public PigStats launchPig(LogicalPlan lp, String grpName, PigContext pc) throws Exception;
            

            Looks like there's no point of throwing them again in ExecutionEngine because they will be caught as Exception in PigServer anyway. If needed, we should take specific actions per exception in ExecutionEngine.

          2. As for the setProperty method in ExecutionEngine, do we need to pass a properties? Can we construct a properties with the given key/value pair and call recomputeProperties() internally?
            public void setProperty(Properties properties, String property, String value);
            

            Also, as for the setProperty method in MRExecutionEngine, looks like it's mostly duplicate of recomputeProperties(). Can you just reuse recomputeProperties()?

          Julien said you're working on a new patch. It would be nice if you could incorporate these (of course if you agree with me). Thank you a lot!

          Show
          Cheolsoo Park added a comment - I have two more comments in ExecutionEngine and MRExecutionEngine as follows: Can you simplify the checked exceptions in the ExecutionEngine interface? For example, From: public PigStats launchPig(LogicalPlan lp, String grpName, PigContext pc) throws PlanException, VisitorException, IOException, ExecException, JobCreationException, FrontendException, Exception; To: public PigStats launchPig(LogicalPlan lp, String grpName, PigContext pc) throws Exception; Looks like there's no point of throwing them again in ExecutionEngine because they will be caught as Exception in PigServer anyway. If needed, we should take specific actions per exception in ExecutionEngine. As for the setProperty method in ExecutionEngine, do we need to pass a properties? Can we construct a properties with the given key/value pair and call recomputeProperties() internally? public void setProperty(Properties properties, String property, String value); Also, as for the setProperty method in MRExecutionEngine, looks like it's mostly duplicate of recomputeProperties(). Can you just reuse recomputeProperties()? Julien said you're working on a new patch. It would be nice if you could incorporate these (of course if you agree with me). Thank you a lot!
          Hide
          Julien Le Dem added a comment -

          Cheolsoo Park
          1. Do we really throw Exception ? If yes, then let's just throw that. If not then let's instead have FrontEndException, ExecException, IOException. i.e. let's remove the exceptions that are already included by the highest exception level.
          2. agreed with you. I would expect the execution engine to handle the Properties internally and the signature of this method to be:

          public void setProperty(String property, String value);
          
          Show
          Julien Le Dem added a comment - Cheolsoo Park 1. Do we really throw Exception ? If yes, then let's just throw that. If not then let's instead have FrontEndException, ExecException, IOException. i.e. let's remove the exceptions that are already included by the highest exception level. 2. agreed with you. I would expect the execution engine to handle the Properties internally and the signature of this method to be: public void setProperty(String property, String value);
          Hide
          Cheolsoo Park added a comment -

          Julien Le Dem,

          Do we really throw Exception?

          No, we don't throw Exception to the end user. But currently, PigServer catches them all in a single catch block and sort them out using instanceof calls (see below). Probably we should make ExecutionEngine throw FEE, EE, and IOE and replace instanceof calls with catch blocks in PigServer.

          try {
              stats = pigContext.getExecutionEngine().launchPig(lp, jobName, pigContext);
          } catch (Exception e) {
              // There are a lot of exceptions thrown by the launcher.  If this
              // is an ExecException, just let it through.  Else wrap it.
              if (e instanceof ExecException){
                  throw (ExecException)e;
              } else if (e instanceof FrontendException) {
                  throw (FrontendException)e;
              } else {
                  int errCode = 2043;
                  String msg = "Unexpected error during execution.";
                  throw new ExecException(msg, errCode, PigException.BUG, e);
              }
          }
          
          Show
          Cheolsoo Park added a comment - Julien Le Dem , Do we really throw Exception? No, we don't throw Exception to the end user. But currently, PigServer catches them all in a single catch block and sort them out using instanceof calls (see below). Probably we should make ExecutionEngine throw FEE, EE, and IOE and replace instanceof calls with catch blocks in PigServer. try { stats = pigContext.getExecutionEngine().launchPig(lp, jobName, pigContext); } catch (Exception e) { // There are a lot of exceptions thrown by the launcher. If this // is an ExecException, just let it through. Else wrap it. if (e instanceof ExecException){ throw (ExecException)e; } else if (e instanceof FrontendException) { throw (FrontendException)e; } else { int errCode = 2043; String msg = "Unexpected error during execution." ; throw new ExecException(msg, errCode, PigException.BUG, e); } }
          Hide
          Achal Soni added a comment -

          I have taken all the suggestions into account and regenerated a new patch that is hopefully cleaner, smaller, and reflects most of the suggestions. The patch is attached and the review board is the following:

          https://reviews.apache.org/r/13714/

          Show
          Achal Soni added a comment - I have taken all the suggestions into account and regenerated a new patch that is hopefully cleaner, smaller, and reflects most of the suggestions. The patch is attached and the review board is the following: https://reviews.apache.org/r/13714/
          Hide
          Achal Soni added a comment -

          Let me know if there are any pressing changes to this patch!

          Show
          Achal Soni added a comment - Let me know if there are any pressing changes to this patch!
          Hide
          Julien Le Dem added a comment -

          I have submitted my review. This looks great Achal Soni!
          Cheolsoo Park does it look good to you?
          Once Achal has updated his patch I'm willing to commit.

          Show
          Julien Le Dem added a comment - I have submitted my review. This looks great Achal Soni ! Cheolsoo Park does it look good to you? Once Achal has updated his patch I'm willing to commit.
          Hide
          Cheolsoo Park added a comment -

          Julien Le Dem, I haven't looked at it yet, but I will review it tonight. I will also run full unit tests.

          Btw, I was meeting Mark, Olga, Rohini, and Daniel at LinkedIn this morning. We decided to create a tez branch. Rohini suggested that this patch should go into that branch instead of trunk. Can we agree where we should commit this patch first? Personally, I think this can go into trunk directly since it's quite general. But there were some concerns.

          Show
          Cheolsoo Park added a comment - Julien Le Dem , I haven't looked at it yet, but I will review it tonight. I will also run full unit tests. Btw, I was meeting Mark, Olga, Rohini, and Daniel at LinkedIn this morning. We decided to create a tez branch. Rohini suggested that this patch should go into that branch instead of trunk. Can we agree where we should commit this patch first? Personally, I think this can go into trunk directly since it's quite general. But there were some concerns.
          Hide
          Julien Le Dem added a comment -

          The point is to be able to implement alternate execution engines without having to fork Pig.
          I think it should go in trunk.

          Show
          Julien Le Dem added a comment - The point is to be able to implement alternate execution engines without having to fork Pig. I think it should go in trunk.
          Hide
          Dmitriy V. Ryaboy added a comment -

          I'd like this patch in trunk since it's not Tez-specific, and allows people to experiment with other runtimes (for example, Spark or Drill).

          Show
          Dmitriy V. Ryaboy added a comment - I'd like this patch in trunk since it's not Tez-specific, and allows people to experiment with other runtimes (for example, Spark or Drill).
          Hide
          Mark Wagner added a comment -

          I'd also be in favor putting this in trunk as opposed to a Tez branch. Although the motivation for this is Tez, I think we would want this patch in Pig even if there wasn't Tez support.

          A couple short comments for Achal:

          • It looks like the build targets that include the META-INF are only executed when building against hadoopversion=23. The META-INF also don't seem to be included in the pig.jar and pig-withouthadoop.jar that go in the root directory. I tried copying in the correct jars, but it seems like something is still off.
          • The changes to the try/catch blocks in MapReduceLauncher break on 23, because HadoopShims for 23 doesn't throw an exception where 20 does. Maybe that should be fixed in HadoopShims though.
          Show
          Mark Wagner added a comment - I'd also be in favor putting this in trunk as opposed to a Tez branch. Although the motivation for this is Tez, I think we would want this patch in Pig even if there wasn't Tez support. A couple short comments for Achal: It looks like the build targets that include the META-INF are only executed when building against hadoopversion=23. The META-INF also don't seem to be included in the pig.jar and pig-withouthadoop.jar that go in the root directory. I tried copying in the correct jars, but it seems like something is still off. The changes to the try/catch blocks in MapReduceLauncher break on 23, because HadoopShims for 23 doesn't throw an exception where 20 does. Maybe that should be fixed in HadoopShims though.
          Hide
          Cheolsoo Park added a comment -

          I see 295 failures with "ant clean test". The number looks scary, but all the failures seem to boil down to two reasons:

          1. As Mark already mentioned, META-INF is not executed. So many test cases fail with the following error:
            error msg: Unknown exec type: local
            
          2. The removal of PigServer.compilePp() makes many test cases fail with the following error:
            java.lang.NoSuchMethodException: org.apache.pig.PigServer.compilePp()
            

          As for indentation, can you please use 4-spaces instead of tabs? Tabs make indentation look funny at several places. A couple of awk/sed commands should do the job.

          Otherwise, looks great to me too. Thank you Achal for the wonderful work!

          Show
          Cheolsoo Park added a comment - I see 295 failures with "ant clean test". The number looks scary, but all the failures seem to boil down to two reasons: As Mark already mentioned, META-INF is not executed. So many test cases fail with the following error: error msg: Unknown exec type: local The removal of PigServer.compilePp() makes many test cases fail with the following error: java.lang.NoSuchMethodException: org.apache.pig.PigServer.compilePp() As for indentation, can you please use 4-spaces instead of tabs? Tabs make indentation look funny at several places. A couple of awk/sed commands should do the job. Otherwise, looks great to me too. Thank you Achal for the wonderful work!
          Hide
          Cheolsoo Park added a comment -

          Attached test_failures.txt.

          Show
          Cheolsoo Park added a comment - Attached test_failures.txt.
          Hide
          Achal Soni added a comment -

          Hi all,

          I've reuploaded the patch with all of the changes that Julien suggested as well as accounting for the comments from Mark. The test cases should be ok now (hopefully!) as we changed build.xml to package the META-INF folder and I changed the compilePp() issue.

          Cheolsoo Park Can you give this a look when you have time, and run the test suite? I think everything should be fine now. Also if you can tell me what exactly I should be doing for indentation/in what files that'd be great. I seem to have some problems with the whitespace/indentation aspect so some pointers here would be awesome.

          Let me know if anything else seems wrong.

          Achal

          Show
          Achal Soni added a comment - Hi all, I've reuploaded the patch with all of the changes that Julien suggested as well as accounting for the comments from Mark. The test cases should be ok now (hopefully!) as we changed build.xml to package the META-INF folder and I changed the compilePp() issue. Cheolsoo Park Can you give this a look when you have time, and run the test suite? I think everything should be fine now. Also if you can tell me what exactly I should be doing for indentation/in what files that'd be great. I seem to have some problems with the whitespace/indentation aspect so some pointers here would be awesome. Let me know if anything else seems wrong. Achal
          Hide
          Achal Soni added a comment -

          Here is the ReviewBoard for the new patch :

          https://reviews.apache.org/r/13752

          Show
          Achal Soni added a comment - Here is the ReviewBoard for the new patch : https://reviews.apache.org/r/13752
          Hide
          Cheolsoo Park added a comment -

          I will kick off the unit tests with the new patch now.

          if you can tell me what exactly I should be doing for indentation/in what files that'd be great.

          This is not a big deal. Basically, you can run the following command to replace tab chars in your patch:

          sed -i .orig '/^+/,/$/ s/<tab>/<4 whitespaces>/g' updated-8-22-2013-exec-engine.patch
          

          Then, the modified patch can be applied with "-l" option (--ignore-whitespace):

          patch -l < updated-8-22-2013-exec-engine.patch
          
          Show
          Cheolsoo Park added a comment - I will kick off the unit tests with the new patch now. if you can tell me what exactly I should be doing for indentation/in what files that'd be great. This is not a big deal. Basically, you can run the following command to replace tab chars in your patch: sed -i .orig '/^+/,/$/ s/<tab>/<4 whitespaces>/g' updated-8-22-2013-exec-engine.patch Then, the modified patch can be applied with "-l" option (--ignore-whitespace): patch -l < updated-8-22-2013-exec-engine.patch
          Hide
          Julien Le Dem added a comment -

          +1 LGTM
          If test-commit passes I think we can commit to TRUNK

          Show
          Julien Le Dem added a comment - +1 LGTM If test-commit passes I think we can commit to TRUNK
          Hide
          Cheolsoo Park added a comment -

          All, so here is the list of failing tests:

          org.apache.pig.test.TestGrunt.testScriptMissingLastNewLine
          org.apache.pig.test.TestGrunt.testCheckScriptSyntaxWithSemiColonUDFErr
          org.apache.pig.test.TestGrunt.testExplainDot
          org.apache.pig.test.TestGrunt.testExplainOut
          org.apache.pig.test.TestGrunt.testExplainBrief
          org.apache.pig.test.TestGrunt.testExplainEmpty
          org.apache.pig.test.TestGrunt.testExplainScript
          org.apache.pig.test.TestInputOutputMiniClusterFileValidator.testValidationNeg
          org.apache.pig.test.TestJobStats.testOneTaskReport
          org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage1
          org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage2
          org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage3
          org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage4
          org.apache.pig.test.TestJobStats.testMedianMapReduceTime
          org.apache.pig.test.TestJobStats.testGetOuputSizeUsingFileBasedStorage
          org.apache.pig.test.TestMRExecutionEngine.testJobConfGeneration
          org.apache.pig.test.TestMRExecutionEngine.testJobConfGenerationWithUserConfigs
          org.apache.pig.test.TestMacroExpansion.test20
          org.apache.pig.test.TestMacroExpansion.test21
          org.apache.pig.test.TestMacroExpansion.test22
          org.apache.pig.test.TestMacroExpansion.test23
          org.apache.pig.test.TestMacroExpansion.test32
          org.apache.pig.test.TestMacroExpansion.test33
          org.apache.pig.test.TestMacroExpansion.test34
          org.apache.pig.test.TestMacroExpansion.test35
          org.apache.pig.test.TestMacroExpansion.testCommentInMacro
          org.apache.pig.test.TestMacroExpansion.testNegativeNumber
          org.apache.pig.test.TestMacroExpansion.typecastTest
          org.apache.pig.test.TestMacroExpansion.testFilter
          org.apache.pig.test.TestMapSideCogroup.testFailure2
          org.apache.pig.test.TestMergeJoinOuter.testFailure
          org.apache.pig.test.TestPigRunner.testEmptyFile
          org.apache.pig.test.TestScriptLanguage.testSysArguments
          org.apache.pig.test.TestShortcuts.testExplainShortcutNoAlias
          org.apache.pig.test.TestShortcuts.testExplainShortcutNoAliasDefined
          

          I prefer fixing them beforehand to fixing them afterward. Although none of these failures is serious (I believe), can we have a couple of more days before committing Achal's patch? I will make sure it gets committed into trunk because I definitely need it for a Tez branch.

          Thoughts?

          Show
          Cheolsoo Park added a comment - All, so here is the list of failing tests: org.apache.pig.test.TestGrunt.testScriptMissingLastNewLine org.apache.pig.test.TestGrunt.testCheckScriptSyntaxWithSemiColonUDFErr org.apache.pig.test.TestGrunt.testExplainDot org.apache.pig.test.TestGrunt.testExplainOut org.apache.pig.test.TestGrunt.testExplainBrief org.apache.pig.test.TestGrunt.testExplainEmpty org.apache.pig.test.TestGrunt.testExplainScript org.apache.pig.test.TestInputOutputMiniClusterFileValidator.testValidationNeg org.apache.pig.test.TestJobStats.testOneTaskReport org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage1 org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage2 org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage3 org.apache.pig.test.TestJobStats.testGetOuputSizeUsingNonFileBasedStorage4 org.apache.pig.test.TestJobStats.testMedianMapReduceTime org.apache.pig.test.TestJobStats.testGetOuputSizeUsingFileBasedStorage org.apache.pig.test.TestMRExecutionEngine.testJobConfGeneration org.apache.pig.test.TestMRExecutionEngine.testJobConfGenerationWithUserConfigs org.apache.pig.test.TestMacroExpansion.test20 org.apache.pig.test.TestMacroExpansion.test21 org.apache.pig.test.TestMacroExpansion.test22 org.apache.pig.test.TestMacroExpansion.test23 org.apache.pig.test.TestMacroExpansion.test32 org.apache.pig.test.TestMacroExpansion.test33 org.apache.pig.test.TestMacroExpansion.test34 org.apache.pig.test.TestMacroExpansion.test35 org.apache.pig.test.TestMacroExpansion.testCommentInMacro org.apache.pig.test.TestMacroExpansion.testNegativeNumber org.apache.pig.test.TestMacroExpansion.typecastTest org.apache.pig.test.TestMacroExpansion.testFilter org.apache.pig.test.TestMapSideCogroup.testFailure2 org.apache.pig.test.TestMergeJoinOuter.testFailure org.apache.pig.test.TestPigRunner.testEmptyFile org.apache.pig.test.TestScriptLanguage.testSysArguments org.apache.pig.test.TestShortcuts.testExplainShortcutNoAlias org.apache.pig.test.TestShortcuts.testExplainShortcutNoAliasDefined I prefer fixing them beforehand to fixing them afterward. Although none of these failures is serious (I believe), can we have a couple of more days before committing Achal's patch? I will make sure it gets committed into trunk because I definitely need it for a Tez branch. Thoughts?
          Hide
          Achal Soni added a comment -

          Cheolsoo Park Thanks a lot for running the test suite! It's good to see where the patch is failing. I definitely agree that all of these need to be investigated before the patch gets anywhere.

          I have some ideas about a few of the test cases, looks to be some minor stuff with JobStats and the way Explain works now which I have to look into. The rest I can't really think of off hte top of my head but I'll give it a shot.

          I'll report back with some more findings as soon as possible.

          Show
          Achal Soni added a comment - Cheolsoo Park Thanks a lot for running the test suite! It's good to see where the patch is failing. I definitely agree that all of these need to be investigated before the patch gets anywhere. I have some ideas about a few of the test cases, looks to be some minor stuff with JobStats and the way Explain works now which I have to look into. The rest I can't really think of off hte top of my head but I'll give it a shot. I'll report back with some more findings as soon as possible.
          Hide
          Rohini Palaniswamy added a comment -

          In the Pig-on-Tez meeting in Linkedin we decided to do Tez work on a branch and that Cheolsoo will initiate conversation thread on mailing list for it and take up the task of creating the branch. Tez is relatively new and unstable so it will be wise to not start with code directly on trunk. Hive is also doing their Tez work on a branch.

          Cheolsoo had a question as to whether we should commit this to trunk and branch after that. I would prefer PIG-3419 to be also put in the branch and not checked into trunk. It makes lot of changes to the Exceptions thrown, removes public methods etc and that might cause backward incompatibility during runtime with code compiled with previous versions of pig. All that needs to be figured out and fixed. So might not be a good idea to get this patch directly into trunk. Thoughts?

          Show
          Rohini Palaniswamy added a comment - In the Pig-on-Tez meeting in Linkedin we decided to do Tez work on a branch and that Cheolsoo will initiate conversation thread on mailing list for it and take up the task of creating the branch. Tez is relatively new and unstable so it will be wise to not start with code directly on trunk. Hive is also doing their Tez work on a branch. Cheolsoo had a question as to whether we should commit this to trunk and branch after that. I would prefer PIG-3419 to be also put in the branch and not checked into trunk. It makes lot of changes to the Exceptions thrown, removes public methods etc and that might cause backward incompatibility during runtime with code compiled with previous versions of pig. All that needs to be figured out and fixed. So might not be a good idea to get this patch directly into trunk. Thoughts?
          Hide
          Dmitriy V. Ryaboy added a comment -

          Rohini, I want to reiterate that this patch has NO tez dependencies (if it does, that's a bug). The intention is not to make Tez possible. It's to make pluggable execution engines possible; and I do not want that functionality to be tied to a tez branch that will be unstable and in heavy development for the foreseeable future. This work will be immediately useful for the Spork (pig on spark) branch, for example.

          Also, it allows people to work with new runtimes without modifying Pig. So Tez-on-Pig doesn't even have to be done as a branch of this project, someone can go an experiment completely independently.

          For these reasons, I would like it in trunk.

          You make a great point about the danger of changing exceptions, public methods, etc. I believe that most of these are project-public, and annotated as such. Do you have specific methods you are concerned about? Ideally we would change as little as possible for the end user.

          Dmitriy

          Show
          Dmitriy V. Ryaboy added a comment - Rohini, I want to reiterate that this patch has NO tez dependencies (if it does, that's a bug). The intention is not to make Tez possible. It's to make pluggable execution engines possible; and I do not want that functionality to be tied to a tez branch that will be unstable and in heavy development for the foreseeable future. This work will be immediately useful for the Spork (pig on spark) branch, for example. Also, it allows people to work with new runtimes without modifying Pig . So Tez-on-Pig doesn't even have to be done as a branch of this project, someone can go an experiment completely independently. For these reasons, I would like it in trunk. You make a great point about the danger of changing exceptions, public methods, etc. I believe that most of these are project-public, and annotated as such. Do you have specific methods you are concerned about? Ideally we would change as little as possible for the end user. Dmitriy
          Hide
          Olga Natkovich added a comment -

          I think the reason we wanted it on the Tez branch is that it might evolve with Tez implementation and so we would merge the updated code back when Tez is ready. Since there are no plans for any additional backend, is there a need to apply this to trunk sooner rather than later?

          Show
          Olga Natkovich added a comment - I think the reason we wanted it on the Tez branch is that it might evolve with Tez implementation and so we would merge the updated code back when Tez is ready. Since there are no plans for any additional backend, is there a need to apply this to trunk sooner rather than later?
          Hide
          Dmitriy V. Ryaboy added a comment -

          Olga, first commit to the spork branch is from 2012.

          https://github.com/dvryaboy/pig (the default branch on my github is "spork").

          Show
          Dmitriy V. Ryaboy added a comment - Olga, first commit to the spork branch is from 2012 . https://github.com/dvryaboy/pig (the default branch on my github is "spork").
          Hide
          Julien Le Dem added a comment -

          The advantage of having the Execution engine abstraction in trunk is it allows running experimental Pig execution engines implementations like Tez or Spark on an official release of Pig without having to build from a specific branch.
          The execution engine implementations themselves are fairly independent of Pig and do not need to be maintained in a Pig branch.
          If the ExecutionEngine abstraction evolves over time that can be done in Trunk and can be merged independently of the Tez implementation itself.

          Show
          Julien Le Dem added a comment - The advantage of having the Execution engine abstraction in trunk is it allows running experimental Pig execution engines implementations like Tez or Spark on an official release of Pig without having to build from a specific branch. The execution engine implementations themselves are fairly independent of Pig and do not need to be maintained in a Pig branch. If the ExecutionEngine abstraction evolves over time that can be done in Trunk and can be merged independently of the Tez implementation itself.
          Hide
          Cheolsoo Park added a comment -

          I am uploading a new patch that includes the following changes:

          • Fixes most test cases (issues with JobStats and Explain).
          • Removes "src/META-INF/services/org.apache.pig.backend.executionengine.ExecType" because it's duplicate. (Probably it was added by mistake.)
          • Renames TestJobStats.java to TestMRJobStats.java since it tests MRJobStats.
          • Fixes a bunch of Java warnings.

          The diff from Achal's last patch can be viewed here.

          I just kicked off the unit tests again and will let you know how it goes. Thanks!

          Show
          Cheolsoo Park added a comment - I am uploading a new patch that includes the following changes: Fixes most test cases (issues with JobStats and Explain). Removes "src/META-INF/services/org.apache.pig.backend.executionengine.ExecType" because it's duplicate. (Probably it was added by mistake.) Renames TestJobStats.java to TestMRJobStats.java since it tests MRJobStats. Fixes a bunch of Java warnings. The diff from Achal's last patch can be viewed here . I just kicked off the unit tests again and will let you know how it goes. Thanks!
          Hide
          Olga Natkovich added a comment -

          I don't have a very strong opinion on this so whatever you guys decide is fine with me.

          I think if it does evolve as part of Tez, at least some changes are likely to sneak into Tez branch without going to trunk so they might diverge for a while but if we are ok to take that chance, that's fine

          Show
          Olga Natkovich added a comment - I don't have a very strong opinion on this so whatever you guys decide is fine with me. I think if it does evolve as part of Tez, at least some changes are likely to sneak into Tez branch without going to trunk so they might diverge for a while but if we are ok to take that chance, that's fine
          Hide
          Cheolsoo Park added a comment -

          Uploading a new patch, and the diff from the previous patch can be viewed here.

          Now failing test cases are as follows:

          org.apache.pig.test.TestGrunt.testScriptMissingLastNewLine
          org.apache.pig.test.TestMapSideCogroup.testFailure2
          org.apache.pig.test.TestMergeJoinOuter.testFailure
          
          Show
          Cheolsoo Park added a comment - Uploading a new patch, and the diff from the previous patch can be viewed here . Now failing test cases are as follows: org.apache.pig.test.TestGrunt.testScriptMissingLastNewLine org.apache.pig.test.TestMapSideCogroup.testFailure2 org.apache.pig.test.TestMergeJoinOuter.testFailure
          Hide
          Cheolsoo Park added a comment -

          I think I fixed all the unit tests. I am uploading a new patch. The diff from the previous one can be viewed here.

          Show
          Cheolsoo Park added a comment - I think I fixed all the unit tests. I am uploading a new patch. The diff from the previous one can be viewed here .
          Hide
          Cheolsoo Park added a comment -

          Kicked off full unit tests now.

          Show
          Cheolsoo Park added a comment - Kicked off full unit tests now.
          Hide
          Cheolsoo Park added a comment -

          There was one more fix that I had to make. TestMRExecutionEngine and TestMRJobStats failed when running them in jenkins. The reason was TestMRCompiler didn't clean up its mini cluster files, so I added a tearDown method to TestMRCompiler here.

          Show
          Cheolsoo Park added a comment - There was one more fix that I had to make. TestMRExecutionEngine and TestMRJobStats failed when running them in jenkins. The reason was TestMRCompiler didn't clean up its mini cluster files, so I added a tearDown method to TestMRCompiler here .
          Hide
          Cheolsoo Park added a comment -

          Rohini Palaniswamy,

          It makes lot of changes to the Exceptions thrown, removes public methods etc and that might cause backward incompatibility during runtime with code compiled with previous versions of pig.

          I am listing all the backward incompatible changes made to public API. Hopefully, this helps us estimate the impact.

          1. PigServer constructor
            -    public PigServer(String execTypeString) throws ExecException, IOException {
            -        this(ExecType.fromString(execTypeString));
            +    public PigServer(String execTypeString) throws PigException {
            +        this(addExecTypeProperty(PropertiesUtil.loadDefaultProperties(), execTypeString));
            +    }
            

            We can revert PigException back to EE and IOE for this constructor (and other new constructors). Not hard to fix.

          2. PigServer.explain()
                 public void explain(String alias,
                                     String format,
                                     boolean verbose,
                                     boolean markAsExecute,
                                     PrintStream lps,
            -                        PrintStream pps,
            -                        PrintStream eps) throws IOException {
            +                        PrintStream eps,
            +                        File dir,
            +                        String suffix) throws IOException {
            

            The method signature changed. Although it's possible that someone uses this method in their applications, there is another method that wraps this one (i.e. public void explain(String alias, PrintStream stream)), and that one is more likely to be used.

          3. Name changes of several public classes:
            • JobStats to MRJobStats
            • PigStatsUtil to MRPigStatsUtil
            • ScriptState to MRPScriptState
            • HExecutionEngine to MRExecutionEngine
          4. PigContext.getExecutionEngine()
            -    public HExecutionEngine getExecutionEngine() {
            +    public ExecutionEngine getExecutionEngine() {
            

            Result of the class name change.

          5. SimplePigStats class is moved from org.apache.pig.tools.pigstats to org.apache.pig.tools.pigstats.mapreduce.
          6. Launcher class is moved from org.apache.pig.backend.hadoop.executionengine.mapReduceLayer to org.apache.pig.backend.hadoop.executionengine.
          7. JobControlCompiler constructor
            -    public JobControlCompiler(PigContext pigContext, Configuration conf) throws IOException {
            +    public JobControlCompiler(PigContext pigContext, Configuration conf) {
            

            The IOE was redundant in the first place. So we should remove it.

          Here is my estimation:

          1. Major - But we can fix it.
          2. Minor - Unlikely used in user code.
          3. Minor - Unlikely used in user code.
          4. Minor - Unlikely used in user code.
          5. Minor - Unlikely used in user code.
          6. Minor - Unlikely used in user code.
          7. Minor - Unlikely used in user code.

          As long as we fix #1, I think we can go ahead commit the patch to trunk. What do you think?

          Show
          Cheolsoo Park added a comment - Rohini Palaniswamy , It makes lot of changes to the Exceptions thrown, removes public methods etc and that might cause backward incompatibility during runtime with code compiled with previous versions of pig. I am listing all the backward incompatible changes made to public API. Hopefully, this helps us estimate the impact. PigServer constructor - public PigServer( String execTypeString) throws ExecException, IOException { - this (ExecType.fromString(execTypeString)); + public PigServer( String execTypeString) throws PigException { + this (addExecTypeProperty(PropertiesUtil.loadDefaultProperties(), execTypeString)); + } We can revert PigException back to EE and IOE for this constructor (and other new constructors). Not hard to fix. PigServer.explain() public void explain( String alias, String format, boolean verbose, boolean markAsExecute, PrintStream lps, - PrintStream pps, - PrintStream eps) throws IOException { + PrintStream eps, + File dir, + String suffix) throws IOException { The method signature changed. Although it's possible that someone uses this method in their applications, there is another method that wraps this one (i.e. public void explain(String alias, PrintStream stream) ), and that one is more likely to be used. Name changes of several public classes: JobStats to MRJobStats PigStatsUtil to MRPigStatsUtil ScriptState to MRPScriptState HExecutionEngine to MRExecutionEngine PigContext.getExecutionEngine() - public HExecutionEngine getExecutionEngine() { + public ExecutionEngine getExecutionEngine() { Result of the class name change. SimplePigStats class is moved from org.apache.pig.tools.pigstats to org.apache.pig.tools.pigstats.mapreduce . Launcher class is moved from org.apache.pig.backend.hadoop.executionengine.mapReduceLayer to org.apache.pig.backend.hadoop.executionengine . JobControlCompiler constructor - public JobControlCompiler(PigContext pigContext, Configuration conf) throws IOException { + public JobControlCompiler(PigContext pigContext, Configuration conf) { The IOE was redundant in the first place. So we should remove it. Here is my estimation: Major - But we can fix it. Minor - Unlikely used in user code. Minor - Unlikely used in user code. Minor - Unlikely used in user code. Minor - Unlikely used in user code. Minor - Unlikely used in user code. Minor - Unlikely used in user code. As long as we fix #1, I think we can go ahead commit the patch to trunk. What do you think?
          Hide
          Cheolsoo Park added a comment -

          All unit tests pass.

          Show
          Cheolsoo Park added a comment - All unit tests pass.
          Hide
          Bikas Saha added a comment -

          Folks, FYI, based on recent feedback we have changed the names used in some of the TEZ API's. It a simple refactoring on the Tez side and should be a simple refactoring fix on the Pig side too. Jira for reference. TEZ-410.

          Show
          Bikas Saha added a comment - Folks, FYI, based on recent feedback we have changed the names used in some of the TEZ API's. It a simple refactoring on the Tez side and should be a simple refactoring fix on the Pig side too. Jira for reference. TEZ-410 .
          Hide
          Dmitriy V. Ryaboy added a comment -

          Cheolsoo thanks so much for helping with this work!

          I think #1 and #3 are the issues (#3 will affect Ambrose and probably Lipstick).

          We can take care of updating Ambrose if we need to. Julien Le Dem do you think this is an important enough semantic change to force advanced clients such as Ambrose to rewrite / recompile? Or should we roll that part back?

          Bikas Saha thanks for the heads up, we'll need to update the pig-on-tez branch. Fortunately it doesn't affect this patch, since it's framework-independent and has no TEZ references.

          Show
          Dmitriy V. Ryaboy added a comment - Cheolsoo thanks so much for helping with this work! I think #1 and #3 are the issues (#3 will affect Ambrose and probably Lipstick). We can take care of updating Ambrose if we need to. Julien Le Dem do you think this is an important enough semantic change to force advanced clients such as Ambrose to rewrite / recompile? Or should we roll that part back? Bikas Saha thanks for the heads up, we'll need to update the pig-on-tez branch. Fortunately it doesn't affect this patch, since it's framework-independent and has no TEZ references.
          Hide
          Bikas Saha added a comment -

          Looks like this jira wasnt the appropriate one to comment on. Is there a different umbrella jira for Pig on Tez that I can track and post comments on?

          Show
          Bikas Saha added a comment - Looks like this jira wasnt the appropriate one to comment on. Is there a different umbrella jira for Pig on Tez that I can track and post comments on?
          Hide
          Achal Soni added a comment -

          Bikas Saha Thanks for the heads-up Bikas! This JIRA is not concerned with the Tez integration for Pig and is simply the abstraction in Pig to allow for alternate ExecutionEngines in Pig. But will certainly change this on the Tez integration side of stuff.

          Thanks a lot Cheolsoo Park for continuing this! I think everything looks good from my end. I can certainly see why we may want to keep this on a different branch until everything is finalized. Certain things may still need more work. For example, OutputStats is not completed abstracted out, as it still has references to POStore which is a MR implementation construct. ScriptState/PPNL/JobStats may still need more abstraction (especially PPNL) and reworking to incorporate a new ExecutionEngine abstraction. I think what we have done here is the minimum foundation for an abstraction though, and it would be appropriate to put into trunk, but these are not my decisions to make.

          With regard to public methods that were changed, I don't think most of them are a big deal, besides as Cheolsoo said, the PigServer throwing PigException. I never thought IOException was a good exception to throw, but I think reverting PigServer back to IOException as it is userfacing code is not a big deal. The rest of the method signature changes shouldn't be worrisome because most of them are internal to the project.

          However, the change from JobStats to MRJobStats, while necessary (as each ExecutionEngine would have it's own type of JobStats it would present to the end user), could be problematic because it is userfacing code and would probably break people who were previously using JobStats. That I think is the most important thing to keep in mind. The task of making the PPNL and JobStats clearly tied to the ExecutionEngine should be thought through also.

          Show
          Achal Soni added a comment - Bikas Saha Thanks for the heads-up Bikas! This JIRA is not concerned with the Tez integration for Pig and is simply the abstraction in Pig to allow for alternate ExecutionEngines in Pig. But will certainly change this on the Tez integration side of stuff. Thanks a lot Cheolsoo Park for continuing this! I think everything looks good from my end. I can certainly see why we may want to keep this on a different branch until everything is finalized. Certain things may still need more work. For example, OutputStats is not completed abstracted out, as it still has references to POStore which is a MR implementation construct. ScriptState/PPNL/JobStats may still need more abstraction (especially PPNL) and reworking to incorporate a new ExecutionEngine abstraction. I think what we have done here is the minimum foundation for an abstraction though, and it would be appropriate to put into trunk, but these are not my decisions to make. With regard to public methods that were changed, I don't think most of them are a big deal, besides as Cheolsoo said, the PigServer throwing PigException. I never thought IOException was a good exception to throw, but I think reverting PigServer back to IOException as it is userfacing code is not a big deal. The rest of the method signature changes shouldn't be worrisome because most of them are internal to the project. However, the change from JobStats to MRJobStats, while necessary (as each ExecutionEngine would have it's own type of JobStats it would present to the end user), could be problematic because it is userfacing code and would probably break people who were previously using JobStats. That I think is the most important thing to keep in mind. The task of making the PPNL and JobStats clearly tied to the ExecutionEngine should be thought through also.
          Hide
          Julien Le Dem added a comment -

          Cheolsoo Park: thanks a lot for looking into this.

          Here are my thoughts:

          1. let's change it back

          2. 4. 5. 6. 7. are either internal to Pig or necessary to add the execution engine abstraction.

          3.
          JobStats still exists but the MR specific part is split into MRJobStats which extends JobStats
          Same thing for PigStatsUtil and ScriptState. Those classes are not disappearing but the MR specific part is abstracted out.
          HExecutionEngine could be renamed back to what it was but this is again what is becoming the new abstraction.
          Unfortunately tools like Ambrose and Lipstick depend on the MR specific parts of Pig and look at the internals. This patch is a necessary change so that those tools can work independently of the execution engine in the future.
          The changes to Ambrose and Lipstick should be minimal though with this patch. But yes they would suffer from some incompatibility, but again there is no way around it when a tool looks inside the execution engine internals.

          I think we should revert 1. and commit the patch.

          Show
          Julien Le Dem added a comment - Cheolsoo Park : thanks a lot for looking into this. Here are my thoughts: 1. let's change it back 2. 4. 5. 6. 7. are either internal to Pig or necessary to add the execution engine abstraction. 3. JobStats still exists but the MR specific part is split into MRJobStats which extends JobStats Same thing for PigStatsUtil and ScriptState. Those classes are not disappearing but the MR specific part is abstracted out. HExecutionEngine could be renamed back to what it was but this is again what is becoming the new abstraction. Unfortunately tools like Ambrose and Lipstick depend on the MR specific parts of Pig and look at the internals. This patch is a necessary change so that those tools can work independently of the execution engine in the future. The changes to Ambrose and Lipstick should be minimal though with this patch. But yes they would suffer from some incompatibility, but again there is no way around it when a tool looks inside the execution engine internals. I think we should revert 1. and commit the patch.
          Hide
          Cheolsoo Park added a comment -

          I am uploading a new patch that revert the PigServer constructor (#1). The diff can be viewed here. (There are two unrelated minor changes.)

          The new patch is rebased to trunk. Please let me know if anyone has objections. If I don't hear back, I will commit it to trunk tomorrow. Thanks!

          Show
          Cheolsoo Park added a comment - I am uploading a new patch that revert the PigServer constructor (#1). The diff can be viewed here . (There are two unrelated minor changes.) The new patch is rebased to trunk. Please let me know if anyone has objections. If I don't hear back, I will commit it to trunk tomorrow. Thanks!
          Hide
          Achal Soni added a comment -

          I agree with all that is said, but there is no need to rename HExecutionEngine back. It doesn't semantically make sense and I don't think that anybody was directly interacting it outside of the test cases?

          Whatever changes to Ambrose and Lipstick should be communicated clearly also. I have noted some issues with PPNL before with regard to abstraction – namely, Pig provides the MROperPlan to the listeners, which is not relevant in a differen execution engine. Julien suggested this should be fixed in a follow up patch. This will most certainly affect Ambrose and Lipstick so we should be cautious in that regard.

          Show
          Achal Soni added a comment - I agree with all that is said, but there is no need to rename HExecutionEngine back. It doesn't semantically make sense and I don't think that anybody was directly interacting it outside of the test cases? Whatever changes to Ambrose and Lipstick should be communicated clearly also. I have noted some issues with PPNL before with regard to abstraction – namely, Pig provides the MROperPlan to the listeners, which is not relevant in a differen execution engine. Julien suggested this should be fixed in a follow up patch. This will most certainly affect Ambrose and Lipstick so we should be cautious in that regard.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Bill Graham looping you in for Ambrose.

          Show
          Dmitriy V. Ryaboy added a comment - Bill Graham looping you in for Ambrose.
          Hide
          Julien Le Dem added a comment -

          +1
          Cheolsoo Park LGTM!

          Show
          Julien Le Dem added a comment - +1 Cheolsoo Park LGTM!
          Hide
          Cheolsoo Park added a comment -
          Show
          Cheolsoo Park added a comment - Committed to trunk: http://svn.apache.org/viewvc?view=revision&revision=1519062 Thank you Achal!
          Hide
          Daniel Dai added a comment -

          This break the Oozie build, which uses PigStatsUtil.

          {HDFS_BYTES_WRITTEN, MAP_INPUT_RECORDS, MAP_OUTPUT_RECORDS, REDUCE_INPUT_RECORDS, REDUCE_OUTPUT_RECORDS}

          . We need to provide a backward compatible way.

          Show
          Daniel Dai added a comment - This break the Oozie build, which uses PigStatsUtil. {HDFS_BYTES_WRITTEN, MAP_INPUT_RECORDS, MAP_OUTPUT_RECORDS, REDUCE_INPUT_RECORDS, REDUCE_OUTPUT_RECORDS} . We need to provide a backward compatible way.
          Hide
          Daniel Dai added a comment -

          Opened PIG-3457 for that.

          Show
          Daniel Dai added a comment - Opened PIG-3457 for that.
          Hide
          Rohini Palaniswamy added a comment -

          I having second thoughts on having this patch in 0.12 in and wondering whether we should revert this and keep it only in Tez branch. Two reasons for that:

          • Seeing PIG-3457 which was my initial concern.
          • Changing interfaces to be backward compatible is very tricky and the workarounds are hacky or ugly. Faced that with PIG-3255. And this patch introduces lot of changes and new interfaces for the purpose of future work which is yet to take off from POC stages. The interfaces are bound to evolve when actual implementations are done or become different from what is in this patch if we end up finding cleaner abstractions. Putting something in a release which we are not very sure of does not seem like a good idea.

          Someone who wants to do experimental work can start off with tez branch since it is experimental work anyways. Basically I just want to keep experimentation code separate from production code since we are talking about releasing Pig 0.12. Thoughts?

          Show
          Rohini Palaniswamy added a comment - I having second thoughts on having this patch in 0.12 in and wondering whether we should revert this and keep it only in Tez branch. Two reasons for that: Seeing PIG-3457 which was my initial concern. Changing interfaces to be backward compatible is very tricky and the workarounds are hacky or ugly. Faced that with PIG-3255 . And this patch introduces lot of changes and new interfaces for the purpose of future work which is yet to take off from POC stages. The interfaces are bound to evolve when actual implementations are done or become different from what is in this patch if we end up finding cleaner abstractions. Putting something in a release which we are not very sure of does not seem like a good idea. Someone who wants to do experimental work can start off with tez branch since it is experimental work anyways. Basically I just want to keep experimentation code separate from production code since we are talking about releasing Pig 0.12. Thoughts?
          Hide
          Dmitriy V. Ryaboy added a comment -

          This is not just for Tez. The point is to enable POC work (in branches, forks, etc) and not have each such attempt redo all the work in this ticket. It's the same reason we provide things like pluggable LoadFuncs to let people work on things they want to load we didn't think of loading.

          We should certainly work to stabilize 0.12 and fix issues like PIG-3457

          Show
          Dmitriy V. Ryaboy added a comment - This is not just for Tez. The point is to enable POC work (in branches, forks, etc) and not have each such attempt redo all the work in this ticket. It's the same reason we provide things like pluggable LoadFuncs to let people work on things they want to load we didn't think of loading. We should certainly work to stabilize 0.12 and fix issues like PIG-3457
          Hide
          Bill Graham added a comment -

          Would should at least annotate the new interfaces as evolving so we don't need to evolve them in a backward compatible way just yet.

          Show
          Bill Graham added a comment - Would should at least annotate the new interfaces as evolving so we don't need to evolve them in a backward compatible way just yet.
          Hide
          Rohini Palaniswamy added a comment -

          Dmitriy,
          I am not arguing about adding interfaces to enable other frameworks. If you take the LoadFuncs case, a lot of design and work went into that (http://wiki.apache.org/pig/LoadStoreRedesignProposal) in coming up with and finalizing the interfaces and there were Loaders and Storers changed or written with the design of the new interfaces before the new interfaces were released. In this case we have added the interfaces without doing something like that and putting it in a release. That was my concern.

          As Bill suggests if we mark and agree that the new interfaces are unstable and bound to change and no backward compatibility will be provided, then I am good. Because when we actually get to Tez or Spork implementation (not POC), I am sure these are bound to change or more new methods are required.

          Show
          Rohini Palaniswamy added a comment - Dmitriy, I am not arguing about adding interfaces to enable other frameworks. If you take the LoadFuncs case, a lot of design and work went into that ( http://wiki.apache.org/pig/LoadStoreRedesignProposal ) in coming up with and finalizing the interfaces and there were Loaders and Storers changed or written with the design of the new interfaces before the new interfaces were released. In this case we have added the interfaces without doing something like that and putting it in a release. That was my concern. As Bill suggests if we mark and agree that the new interfaces are unstable and bound to change and no backward compatibility will be provided, then I am good. Because when we actually get to Tez or Spork implementation (not POC), I am sure these are bound to change or more new methods are required.
          Hide
          Dmitriy V. Ryaboy added a comment -

          +1 to marking the interfaces as evolving.

          Show
          Dmitriy V. Ryaboy added a comment - +1 to marking the interfaces as evolving.
          Hide
          Cheolsoo Park added a comment -

          I will open a jira to add "unstable" annotations. I am also reviewing PIG-3457 now.

          Show
          Cheolsoo Park added a comment - I will open a jira to add "unstable" annotations. I am also reviewing PIG-3457 now.
          Hide
          Cheolsoo Park added a comment -

          Reverted in 0.12.

          Show
          Cheolsoo Park added a comment - Reverted in 0.12.

            People

            • Assignee:
              Achal Soni
              Reporter:
              Achal Soni
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development