Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-690 Tez API Ease of Use
  3. TEZ-1134

InputInitializer and OutputCommitter implicitly use payloads of the input and output

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.0
    • 0.5.0
    • None
    • Incompatible change, Reviewed

    Attachments

      1. TEZ-1134.3.patch
        115 kB
        Bikas Saha
      2. TEZ-1134.2.patch
        113 kB
        Bikas Saha
      3. TEZ-1134.1.patch
        2 kB
        Rekha Joshi
      4. TEZ-1134.1.patch
        113 kB
        Bikas Saha

      Activity

        rekhajoshm Rekha Joshi added a comment -

        Attached patch.Thanks

        rekhajoshm Rekha Joshi added a comment - Attached patch.Thanks
        rekhajoshm Rekha Joshi added a comment -

        Attached patch.First stab, as unsure if payloads need to be specific/default value (when not shared with input/output).Thanks.

        rekhajoshm Rekha Joshi added a comment - Attached patch.First stab, as unsure if payloads need to be specific/default value (when not shared with input/output).Thanks.
        bikassaha Bikas Saha added a comment -

        This isnt correct. Let me try to explain.
        1) API - the api to addInput() and addOutput() currently just take the initializer and committer class. They need to take some form of TezEntityDescriptor (like ProcessorDescriptor) where the class and its payload can be specified
        2) Back-end - The payload specified via 1) above needs to be available via the appropriate context. This would be a backwards incompatible change.

        bikassaha Bikas Saha added a comment - This isnt correct. Let me try to explain. 1) API - the api to addInput() and addOutput() currently just take the initializer and committer class. They need to take some form of TezEntityDescriptor (like ProcessorDescriptor) where the class and its payload can be specified 2) Back-end - The payload specified via 1) above needs to be available via the appropriate context. This would be a backwards incompatible change.
        bikassaha Bikas Saha added a comment -

        rekhajoshm Do you think you will have time for this soon? We would like to get 0.5 complete quickly and get all the breaking changes done. I can take this up if you are busy now.

        bikassaha Bikas Saha added a comment - rekhajoshm Do you think you will have time for this soon? We would like to get 0.5 complete quickly and get all the breaking changes done. I can take this up if you are busy now.
        rekhajoshm Rekha Joshi added a comment -

        bikassaha You guessed right; got busy with n-threads running
        wanted to do it; but will let you pick it up.with the only hope that there will be more where this came from

        rekhajoshm Rekha Joshi added a comment - bikassaha You guessed right; got busy with n-threads running wanted to do it; but will let you pick it up.with the only hope that there will be more where this came from
        bikassaha Bikas Saha added a comment -

        Sure!

        bikassaha Bikas Saha added a comment - Sure!
        bikassaha Bikas Saha added a comment -

        Patch does the following
        1) Changes the addInput/addOutput to now take entity descriptors for initializer and committer. So now they can have payloads like the rest
        2) Boiler plate code for the transfer over RPC
        3) Changes initializer/committer contexts to provide the payload and changes the previous method to be input/output specific. Keeping them to prevent payload duplication when unnecessary.
        4) Removes RootInputLeafOutputDescriptor from tez-dag project. It was duplicating RootInputLeafOutput from the api project.
        5) Changed tests to now use the committer payload where relevant instead of overloading the input payload.

        bikassaha Bikas Saha added a comment - Patch does the following 1) Changes the addInput/addOutput to now take entity descriptors for initializer and committer. So now they can have payloads like the rest 2) Boiler plate code for the transfer over RPC 3) Changes initializer/committer contexts to provide the payload and changes the previous method to be input/output specific. Keeping them to prevent payload duplication when unnecessary. 4) Removes RootInputLeafOutputDescriptor from tez-dag project. It was duplicating RootInputLeafOutput from the api project. 5) Changed tests to now use the committer payload where relevant instead of overloading the input payload.
        bikassaha Bikas Saha added a comment -

        sseth hitesh Please review.

        bikassaha Bikas Saha added a comment - sseth hitesh Please review.
        bikassaha Bikas Saha added a comment -

        Patch refresh after TEZ-1247.

        bikassaha Bikas Saha added a comment - Patch refresh after TEZ-1247 .
        sseth Siddharth Seth added a comment -

        Looks good. Minor stuff.

        • public Vertex addOutput(String outputName, OutputDescriptor outputDescriptor) { // TODO remove?

          I'm in favour or removing this method since another one exists. hitesh ?

        • In the proto - iO_descriptor should be named i_o_descriptor keeping with Google protobuf naming recommendations. That should generate the same name.
        • On RootInputLeafOutput / RootInputLeafOutputDescriptor. RootInputLeafOutput is in the API package, and was explicitly meant to be a non-public class since it's not user facing. An almost identical copy existed in tez-dag for this reason. This patch ends up making the copy in tez-api public. That, I think, is avoidable where possible - not relying on annotations alone.
        sseth Siddharth Seth added a comment - Looks good. Minor stuff. public Vertex addOutput( String outputName, OutputDescriptor outputDescriptor) { // TODO remove? I'm in favour or removing this method since another one exists. hitesh ? In the proto - iO_descriptor should be named i_o_descriptor keeping with Google protobuf naming recommendations. That should generate the same name. On RootInputLeafOutput / RootInputLeafOutputDescriptor. RootInputLeafOutput is in the API package, and was explicitly meant to be a non-public class since it's not user facing. An almost identical copy existed in tez-dag for this reason. This patch ends up making the copy in tez-api public. That, I think, is avoidable where possible - not relying on annotations alone.
        bikassaha Bikas Saha added a comment -

        That, I think, is avoidable where possible - not relying on annotations alone.

        Any ideas in this case to fix it? The duplicate code was quite messy when I was making this change. So I removed it for future devs.

        bikassaha Bikas Saha added a comment - That, I think, is avoidable where possible - not relying on annotations alone. Any ideas in this case to fix it? The duplicate code was quite messy when I was making this change. So I removed it for future devs.
        sseth Siddharth Seth added a comment -

        Nothing short of duplication. Potentially extension of the tez-api class in the tez-dag module and making the extended class public.

        sseth Siddharth Seth added a comment - Nothing short of duplication. Potentially extension of the tez-api class in the tez-dag module and making the extended class public.
        hitesh Hitesh Shah added a comment - - edited

        I'm in favour or removing this method since another one exists.

        For cases where an output committer is not needed, it would end up with users doing addOutput(outputDesc, null). Hive is a case where they do not need an output committer.

        hitesh Hitesh Shah added a comment - - edited I'm in favour or removing this method since another one exists. For cases where an output committer is not needed, it would end up with users doing addOutput(outputDesc, null). Hive is a case where they do not need an output committer.
        bikassaha Bikas Saha added a comment -

        Ideally I would have punted to the addOutput method for this jira since its unrelated to the change. However, since we have discussed it, I am tilting towards removing the method for 3 reasons
        1) symmetry with addInput
        2) its easy to add it back if needed but hard to remove
        3) its unlikely that most new outputs can get away without a committer. Having it on the API makes the users have to consider this (aided by javadoc). Not having a committer can make stuff work magically and fail later on.

        bikassaha Bikas Saha added a comment - Ideally I would have punted to the addOutput method for this jira since its unrelated to the change. However, since we have discussed it, I am tilting towards removing the method for 3 reasons 1) symmetry with addInput 2) its easy to add it back if needed but hard to remove 3) its unlikely that most new outputs can get away without a committer. Having it on the API makes the users have to consider this (aided by javadoc). Not having a committer can make stuff work magically and fail later on.
        bikassaha Bikas Saha added a comment -

        Removing the method vertex.addOutput() and adding javadoc to the remaining method to clarify the committer usage.

        bikassaha Bikas Saha added a comment - Removing the method vertex.addOutput() and adding javadoc to the remaining method to clarify the committer usage.
        bikassaha Bikas Saha added a comment -

        Thanks for the reviews. Committed.
        commit 74d04a48ac8572631b6b73fc27350526397c00e7
        Author: Bikas Saha <bikas@apache.org>
        Date: Thu Jul 24 15:34:57 2014 -0700

        TEZ-1134. InputInitializer and OutputCommitter implicitly use payloads of the input and output (bikas)

        bikassaha Bikas Saha added a comment - Thanks for the reviews. Committed. commit 74d04a48ac8572631b6b73fc27350526397c00e7 Author: Bikas Saha <bikas@apache.org> Date: Thu Jul 24 15:34:57 2014 -0700 TEZ-1134 . InputInitializer and OutputCommitter implicitly use payloads of the input and output (bikas)
        bikassaha Bikas Saha added a comment -

        Bulk close for jiras fixed in 0.5.0.

        bikassaha Bikas Saha added a comment - Bulk close for jiras fixed in 0.5.0.

        People

          bikassaha Bikas Saha
          bikassaha Bikas Saha
          Votes:
          0 Vote for this issue
          Watchers:
          4 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved: