Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5046

Avoid redundant serialization when creating the TaskDeploymentDescriptor

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.1.3
    • Fix Version/s: 1.2.0, 1.1.4
    • Labels:
      None

      Description

      When creating the TaskDeploymentDescriptor we extract information from the ExecutionGraph which is defined job-wide and from the ExecutionJobVertex which is defined operator-wide. The extracted information will be serialized for every subtask even though it stays the same.

      As an improvement, we can serialize this information once and give the serialized byte array to the TaskDeploymentDescriptor. This will reduce the serialization work Flink has to do when deploying sub tasks.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/2779

          FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information

          In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize
          all information which stays the same for all TaskDeploymentDescriptors. The information which
          is static for a TDD is the job related information contained in the ExecutionGraph and the
          operator/task related information stored in the ExecutionJobVertex.

          In order to pre serialize this information, this PR introduces the JobInformation class
          and the TaskInformation class which are stored in serialized form in the ExecutionGraph
          and the ExecutionJobVertex, respectively.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink eagerStreamConfigSerialization

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2779.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2779


          commit fb7621a5a5023595a89d7e92562b503ec2a039e5
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2016-11-09T18:11:36Z

          FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information

          In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize
          all information which stays the same for all TaskDeploymentDescriptors. The information which
          is static for a TDD is the job related information contained in the ExecutionGraph and the
          operator/task related information stored in the ExecutionJobVertex.

          In order to pre serialize this information, this PR introduces the JobInformation class
          and the TaskInformration class which are stored in serialized form in the ExecutionGraph
          and the ExecutionJobVertex, respectively.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/2779 FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize all information which stays the same for all TaskDeploymentDescriptors. The information which is static for a TDD is the job related information contained in the ExecutionGraph and the operator/task related information stored in the ExecutionJobVertex. In order to pre serialize this information, this PR introduces the JobInformation class and the TaskInformation class which are stored in serialized form in the ExecutionGraph and the ExecutionJobVertex, respectively. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink eagerStreamConfigSerialization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2779.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2779 commit fb7621a5a5023595a89d7e92562b503ec2a039e5 Author: Till Rohrmann <trohrmann@apache.org> Date: 2016-11-09T18:11:36Z FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize all information which stays the same for all TaskDeploymentDescriptors. The information which is static for a TDD is the job related information contained in the ExecutionGraph and the operator/task related information stored in the ExecutionJobVertex. In order to pre serialize this information, this PR introduces the JobInformation class and the TaskInformration class which are stored in serialized form in the ExecutionGraph and the ExecutionJobVertex, respectively.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/2780

          [backport] FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information

          This is a backport of #2779 for the release 1.1 branch.

          In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize
          all information which stays the same for all TaskDeploymentDescriptors. The information which
          is static for a TDD is the job related information contained in the ExecutionGraph and the
          operator/task related information stored in the ExecutionJobVertex.

          In order to pre serialize this information, this PR introduces the JobInformation class
          and the TaskInformration class which are stored in serialized form in the ExecutionGraph
          and the ExecutionJobVertex, respectively.

          Fix for release-1.1

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink backportEagerStreamConfigSerialization

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2780.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2780


          commit 8e7252a9a0411f00d8efcc2ba0e6c9e9ffd88989
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2016-11-09T18:11:36Z

          FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information

          In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize
          all information which stays the same for all TaskDeploymentDescriptors. The information which
          is static for a TDD is the job related information contained in the ExecutionGraph and the
          operator/task related information stored in the ExecutionJobVertex.

          In order to pre serialize this information, this PR introduces the JobInformation class
          and the TaskInformration class which are stored in serialized form in the ExecutionGraph
          and the ExecutionJobVertex, respectively.

          Fix for release-1.1


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/2780 [backport] FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information This is a backport of #2779 for the release 1.1 branch. In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize all information which stays the same for all TaskDeploymentDescriptors. The information which is static for a TDD is the job related information contained in the ExecutionGraph and the operator/task related information stored in the ExecutionJobVertex. In order to pre serialize this information, this PR introduces the JobInformation class and the TaskInformration class which are stored in serialized form in the ExecutionGraph and the ExecutionJobVertex, respectively. Fix for release-1.1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink backportEagerStreamConfigSerialization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2780.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2780 commit 8e7252a9a0411f00d8efcc2ba0e6c9e9ffd88989 Author: Till Rohrmann <trohrmann@apache.org> Date: 2016-11-09T18:11:36Z FLINK-5046 [tdd] Preserialize TaskDeploymentDescriptor information In order to speed up the serialization of the TaskDeploymentDescriptor we can pre serialize all information which stays the same for all TaskDeploymentDescriptors. The information which is static for a TDD is the job related information contained in the ExecutionGraph and the operator/task related information stored in the ExecutionJobVertex. In order to pre serialize this information, this PR introduces the JobInformation class and the TaskInformration class which are stored in serialized form in the ExecutionGraph and the ExecutionJobVertex, respectively. Fix for release-1.1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/2779

          Tests pass locally. Will merge the PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2779 Tests pass locally. Will merge the PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/2779

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2779
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann closed the pull request at:

          https://github.com/apache/flink/pull/2780

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/2780
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Fixed
          in 1.2 via 58204da13d42c265d6a503a8cf738b6522e12ba6
          in 1.1 via 9a19ca115392f33dc138aea8122cb85fb90e784b

          Show
          till.rohrmann Till Rohrmann added a comment - Fixed in 1.2 via 58204da13d42c265d6a503a8cf738b6522e12ba6 in 1.1 via 9a19ca115392f33dc138aea8122cb85fb90e784b

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development