Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5129

make the BlobServer use a distributed file system

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Network
    • Labels:
      None

      Description

      Currently, the BlobServer uses a local storage and, in addition when the HA mode is set, a distributed file system, e.g. hdfs. This, however, is only used by the JobManager and all TaskManager instances request blobs from the JobManager. By using the distributed file system there as well, we would lower the load on the JobManager and increase scalability.

        Issue Links

          Activity

          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed via 9f544d83b3443cf33f5890efdb956678847d445f

          Show
          StephanEwen Stephan Ewen added a comment - Fixed via 9f544d83b3443cf33f5890efdb956678847d445f
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3084

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3084
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3084

          Good change, thanks!

          Merging this...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3084 Good change, thanks! Merging this...
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user NicoK opened a pull request:

          https://github.com/apache/flink/pull/3084

          FLINK-5129 make the BlobServer use a distributed file system

          Make the BlobCache use the BlobServer's distributed file system in HA mode: previously even in HA mode and if the cache has access to the file system, it would download BLOBs from one central BlobServer. By using the distributed file system beneath we may leverage its scalability and remove a single point of (performance) failure. If the distributed file system is not accessible at the blob
          caches, the old behaviour is used.

          @uce can you have a look?
          (this is an updated and fixed version of https://github.com/apache/flink/pull/3076)

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/NicoK/flink FLINK-5129a

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3084.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3084


          commit 464f2c834688507c67acb3ad584827132ebe444e
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-22T11:49:03Z

          [hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath

          This was actually the same implementation as
          FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two
          could have been removed but the implementation makes most sense at the
          concrete file system abstraction layer, i.e. in FileSystemBlobStore.

          commit 2ebffd4c2d499b61f164b4d54dc86c9d44b9c0ea
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-23T15:11:35Z

          [hotfix] do not create intermediate strings inside String.format in BlobUtils

          commit 36ab6121e336f63138e442ea48a751ede7fb04c3
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-24T16:11:19Z

          [hotfix] properly shut down the BlobServer in BlobServerRangeTest

          commit c8c12c67ae875ca5c96db78375bef880cf2a3c59
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:06:01Z

          [hotfix] use JUnit's TemporaryFolder in BlobRecoveryITCase, too

          This makes cleaning up simpler.

          commit a078cb0c26071fe70e3668d23d0c8bef8550892f
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:27:00Z

          [hotfix] add a missing "'" to the BlobStore class

          commit a643f0b989c640a81b112ad14ae27a2a2b1ab257
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:07:13Z

          FLINK-5129 BlobServer: include the cluster id in the HA storage path for blobs

          This applies to the ZookeeperHaServices implementation.

          commit 7d832919040059961940fc96d0cdb285bc9f77d3
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:18:10Z

          FLINK-5129 unify duplicate code between the BlobServer and ZookeeperHaServices

          (this was introduced by c64860677f)

          commit 19879a01b99c4772a09627eb5f380f794f6c1e27
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-30T13:52:12Z

          [hotfix] add some more documentation in BlobStore-related classes

          commit 80c17ef83104d1186c06d8f5d4cde11e4b05f2b8
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T10:55:23Z

          [hotfix] minor code beautifications when checking parameters

          + also check the blobService parameter in BlobLibraryCacheManager

          commit ff920e48bd69acef280bdef2a12e5f5f9cca3a88
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T13:21:42Z

          FLINK-5129 let BlobUtils#initStorageDirectory() throw a proper IOException

          commit c8e2815787338f52e5ad369bcaedb1798284dd29
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T13:59:51Z

          [hotfix] simplify code in BlobCache#deleteGlobal()

          Also, re-order the code so that a local delete is always tried before creating
          a connection to the BlobServer. If that fails, the local file is deleted at
          least.

          commit 5cd1c20aa604a9556c069ab78d8e471fa058499e
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-29T17:11:06Z

          [hotfix] re-use some code in BlobServerDeleteTest

          commit d39948a6baa0cd6f68c4dfd8daffdd65e573fbca
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-30T13:35:38Z

          [hotfix] improve some failure messages in the BlobService's HA unit tests

          commit dc87ae36088cc48a4122351ebe5b09a31d7fba41
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T14:06:30Z

          FLINK-5129 make the BlobCache also use a distributed file system in HA mode

          If available (in HA mode), download the jar files from the distributed file
          system directly instead of querying the BlobServer. This way the load is more
          distributed among the nodes of the file system (depending on its implementation
          of course) compared to putting all the burden on a single BlobServer.

          commit 389eaa9779d4bf22cc3972208d4f35ac7a966f5c
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T16:21:05Z

          FLINK-5129 add unit tests for the BlobCache accessing the distributed FS directly

          commit b3bcf944df87f37cccd831e8fb56b95caa620dad
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-09T13:41:59Z

          FLINK-5129 let FileSystemBlobStore#get() remove the target file on failure

          If the copy fails, an IOException was thrown but the target file remained and
          was (most likely) not finished. This cleans up the file in that case so that
          code above, e.g. BlobServer and BlobCache, can rely on a file being complete as
          long as it exists.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/3084 FLINK-5129 make the BlobServer use a distributed file system Make the BlobCache use the BlobServer's distributed file system in HA mode: previously even in HA mode and if the cache has access to the file system, it would download BLOBs from one central BlobServer. By using the distributed file system beneath we may leverage its scalability and remove a single point of (performance) failure. If the distributed file system is not accessible at the blob caches, the old behaviour is used. @uce can you have a look? (this is an updated and fixed version of https://github.com/apache/flink/pull/3076 ) You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink FLINK-5129 a Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3084.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3084 commit 464f2c834688507c67acb3ad584827132ebe444e Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-22T11:49:03Z [hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath This was actually the same implementation as FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two could have been removed but the implementation makes most sense at the concrete file system abstraction layer, i.e. in FileSystemBlobStore. commit 2ebffd4c2d499b61f164b4d54dc86c9d44b9c0ea Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-23T15:11:35Z [hotfix] do not create intermediate strings inside String.format in BlobUtils commit 36ab6121e336f63138e442ea48a751ede7fb04c3 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-24T16:11:19Z [hotfix] properly shut down the BlobServer in BlobServerRangeTest commit c8c12c67ae875ca5c96db78375bef880cf2a3c59 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:06:01Z [hotfix] use JUnit's TemporaryFolder in BlobRecoveryITCase, too This makes cleaning up simpler. commit a078cb0c26071fe70e3668d23d0c8bef8550892f Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:27:00Z [hotfix] add a missing "'" to the BlobStore class commit a643f0b989c640a81b112ad14ae27a2a2b1ab257 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:07:13Z FLINK-5129 BlobServer: include the cluster id in the HA storage path for blobs This applies to the ZookeeperHaServices implementation. commit 7d832919040059961940fc96d0cdb285bc9f77d3 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:18:10Z FLINK-5129 unify duplicate code between the BlobServer and ZookeeperHaServices (this was introduced by c64860677f) commit 19879a01b99c4772a09627eb5f380f794f6c1e27 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-30T13:52:12Z [hotfix] add some more documentation in BlobStore-related classes commit 80c17ef83104d1186c06d8f5d4cde11e4b05f2b8 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T10:55:23Z [hotfix] minor code beautifications when checking parameters + also check the blobService parameter in BlobLibraryCacheManager commit ff920e48bd69acef280bdef2a12e5f5f9cca3a88 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T13:21:42Z FLINK-5129 let BlobUtils#initStorageDirectory() throw a proper IOException commit c8e2815787338f52e5ad369bcaedb1798284dd29 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T13:59:51Z [hotfix] simplify code in BlobCache#deleteGlobal() Also, re-order the code so that a local delete is always tried before creating a connection to the BlobServer. If that fails, the local file is deleted at least. commit 5cd1c20aa604a9556c069ab78d8e471fa058499e Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-29T17:11:06Z [hotfix] re-use some code in BlobServerDeleteTest commit d39948a6baa0cd6f68c4dfd8daffdd65e573fbca Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-30T13:35:38Z [hotfix] improve some failure messages in the BlobService's HA unit tests commit dc87ae36088cc48a4122351ebe5b09a31d7fba41 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T14:06:30Z FLINK-5129 make the BlobCache also use a distributed file system in HA mode If available (in HA mode), download the jar files from the distributed file system directly instead of querying the BlobServer. This way the load is more distributed among the nodes of the file system (depending on its implementation of course) compared to putting all the burden on a single BlobServer. commit 389eaa9779d4bf22cc3972208d4f35ac7a966f5c Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T16:21:05Z FLINK-5129 add unit tests for the BlobCache accessing the distributed FS directly commit b3bcf944df87f37cccd831e8fb56b95caa620dad Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-09T13:41:59Z FLINK-5129 let FileSystemBlobStore#get() remove the target file on failure If the copy fails, an IOException was thrown but the target file remained and was (most likely) not finished. This cleans up the file in that case so that code above, e.g. BlobServer and BlobCache, can rely on a file being complete as long as it exists.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/3076

          fixed a typo in the unit test that lead to the tests passing although there was still something wrong which is now fixed as well

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3076 fixed a typo in the unit test that lead to the tests passing although there was still something wrong which is now fixed as well
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK closed the pull request at:

          https://github.com/apache/flink/pull/3076

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK closed the pull request at: https://github.com/apache/flink/pull/3076
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user NicoK opened a pull request:

          https://github.com/apache/flink/pull/3076

          FLINK-5129 make the BlobServer use a distributed file system

          Make the BlobCache use the BlobServer's distributed file system in HA mode: previously even in HA mode and if the cache has access to the file system, it would download BLOBs from one central BlobServer. By using the distributed file system beneath we may leverage its scalability and remove a single point of (performance) failure. If the distributed file system is not accessible at the blob
          caches, the old behaviour is used.

          @uce can you have a look?

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/NicoK/flink FLINK-5129a

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3076.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3076


          commit 464f2c834688507c67acb3ad584827132ebe444e
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-22T11:49:03Z

          [hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath

          This was actually the same implementation as
          FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two
          could have been removed but the implementation makes most sense at the
          concrete file system abstraction layer, i.e. in FileSystemBlobStore.

          commit 2ebffd4c2d499b61f164b4d54dc86c9d44b9c0ea
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-23T15:11:35Z

          [hotfix] do not create intermediate strings inside String.format in BlobUtils

          commit 36ab6121e336f63138e442ea48a751ede7fb04c3
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-24T16:11:19Z

          [hotfix] properly shut down the BlobServer in BlobServerRangeTest

          commit c8c12c67ae875ca5c96db78375bef880cf2a3c59
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:06:01Z

          [hotfix] use JUnit's TemporaryFolder in BlobRecoveryITCase, too

          This makes cleaning up simpler.

          commit a078cb0c26071fe70e3668d23d0c8bef8550892f
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:27:00Z

          [hotfix] add a missing "'" to the BlobStore class

          commit a643f0b989c640a81b112ad14ae27a2a2b1ab257
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:07:13Z

          FLINK-5129 BlobServer: include the cluster id in the HA storage path for blobs

          This applies to the ZookeeperHaServices implementation.

          commit 7d832919040059961940fc96d0cdb285bc9f77d3
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-05T17:18:10Z

          FLINK-5129 unify duplicate code between the BlobServer and ZookeeperHaServices

          (this was introduced by c64860677f)

          commit 19879a01b99c4772a09627eb5f380f794f6c1e27
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-30T13:52:12Z

          [hotfix] add some more documentation in BlobStore-related classes

          commit 80c17ef83104d1186c06d8f5d4cde11e4b05f2b8
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T10:55:23Z

          [hotfix] minor code beautifications when checking parameters

          + also check the blobService parameter in BlobLibraryCacheManager

          commit ff920e48bd69acef280bdef2a12e5f5f9cca3a88
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T13:21:42Z

          FLINK-5129 let BlobUtils#initStorageDirectory() throw a proper IOException

          commit c8e2815787338f52e5ad369bcaedb1798284dd29
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T13:59:51Z

          [hotfix] simplify code in BlobCache#deleteGlobal()

          Also, re-order the code so that a local delete is always tried before creating
          a connection to the BlobServer. If that fails, the local file is deleted at
          least.

          commit 38626a705fd0725a8e54f2ee1c3d0ec410184b8a
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T14:06:30Z

          FLINK-5129 make the BlobCache also use a distributed file system in HA mode

          If available (in HA mode), download the jar files from the distributed file
          system directly instead of querying the BlobServer. This way the load is more
          distributed among the nodes of the file system (depending on its implementation
          of course) compared to putting all the burden on a single BlobServer.

          commit 1e86c5c92f9ac35c26c1e707d2d840c4edbeefb1
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-29T17:11:06Z

          [hotfix] re-use some code in BlobServerDeleteTest

          commit 68d2959b60f6b583cb48de8ed5aee3e18b163082
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-30T13:35:38Z

          [hotfix] improve some failure messages in the BlobService's HA unit tests

          commit 7cfbeb7707329cad57604a58f44254d4f8b6c9b3
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-01-06T16:21:05Z

          FLINK-5129 add unit tests for the BlobCache accessing the distributed FS directly


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/3076 FLINK-5129 make the BlobServer use a distributed file system Make the BlobCache use the BlobServer's distributed file system in HA mode: previously even in HA mode and if the cache has access to the file system, it would download BLOBs from one central BlobServer. By using the distributed file system beneath we may leverage its scalability and remove a single point of (performance) failure. If the distributed file system is not accessible at the blob caches, the old behaviour is used. @uce can you have a look? You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink FLINK-5129 a Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3076.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3076 commit 464f2c834688507c67acb3ad584827132ebe444e Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-22T11:49:03Z [hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath This was actually the same implementation as FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two could have been removed but the implementation makes most sense at the concrete file system abstraction layer, i.e. in FileSystemBlobStore. commit 2ebffd4c2d499b61f164b4d54dc86c9d44b9c0ea Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-23T15:11:35Z [hotfix] do not create intermediate strings inside String.format in BlobUtils commit 36ab6121e336f63138e442ea48a751ede7fb04c3 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-24T16:11:19Z [hotfix] properly shut down the BlobServer in BlobServerRangeTest commit c8c12c67ae875ca5c96db78375bef880cf2a3c59 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:06:01Z [hotfix] use JUnit's TemporaryFolder in BlobRecoveryITCase, too This makes cleaning up simpler. commit a078cb0c26071fe70e3668d23d0c8bef8550892f Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:27:00Z [hotfix] add a missing "'" to the BlobStore class commit a643f0b989c640a81b112ad14ae27a2a2b1ab257 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:07:13Z FLINK-5129 BlobServer: include the cluster id in the HA storage path for blobs This applies to the ZookeeperHaServices implementation. commit 7d832919040059961940fc96d0cdb285bc9f77d3 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-05T17:18:10Z FLINK-5129 unify duplicate code between the BlobServer and ZookeeperHaServices (this was introduced by c64860677f) commit 19879a01b99c4772a09627eb5f380f794f6c1e27 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-30T13:52:12Z [hotfix] add some more documentation in BlobStore-related classes commit 80c17ef83104d1186c06d8f5d4cde11e4b05f2b8 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T10:55:23Z [hotfix] minor code beautifications when checking parameters + also check the blobService parameter in BlobLibraryCacheManager commit ff920e48bd69acef280bdef2a12e5f5f9cca3a88 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T13:21:42Z FLINK-5129 let BlobUtils#initStorageDirectory() throw a proper IOException commit c8e2815787338f52e5ad369bcaedb1798284dd29 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T13:59:51Z [hotfix] simplify code in BlobCache#deleteGlobal() Also, re-order the code so that a local delete is always tried before creating a connection to the BlobServer. If that fails, the local file is deleted at least. commit 38626a705fd0725a8e54f2ee1c3d0ec410184b8a Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T14:06:30Z FLINK-5129 make the BlobCache also use a distributed file system in HA mode If available (in HA mode), download the jar files from the distributed file system directly instead of querying the BlobServer. This way the load is more distributed among the nodes of the file system (depending on its implementation of course) compared to putting all the burden on a single BlobServer. commit 1e86c5c92f9ac35c26c1e707d2d840c4edbeefb1 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-29T17:11:06Z [hotfix] re-use some code in BlobServerDeleteTest commit 68d2959b60f6b583cb48de8ed5aee3e18b163082 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-30T13:35:38Z [hotfix] improve some failure messages in the BlobService's HA unit tests commit 7cfbeb7707329cad57604a58f44254d4f8b6c9b3 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-01-06T16:21:05Z FLINK-5129 add unit tests for the BlobCache accessing the distributed FS directly
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK closed the pull request at:

          https://github.com/apache/flink/pull/2891

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK closed the pull request at: https://github.com/apache/flink/pull/2891
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/2891

          I need to adapt a few things and choose a different approach - I'll re-open later

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/2891 I need to adapt a few things and choose a different approach - I'll re-open later
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/2891

          despite the tests completing successfully, I do still need to check a few things:

          • `BlobService#getURL()` may now return a URL for a distributed file system, however:
          • related code, e.g. `java.io.File,` may not know how to handle HDFS URLs, for example
          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/2891 despite the tests completing successfully, I do still need to check a few things: `BlobService#getURL()` may now return a URL for a distributed file system, however: related code, e.g. `java.io.File,` may not know how to handle HDFS URLs, for example
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/2891

          Sorry for the hassle, found a regression and added a fix plus an appropriate test for it. Should be fine now.

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/2891 Sorry for the hassle, found a regression and added a fix plus an appropriate test for it. Should be fine now.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user NicoK opened a pull request:

          https://github.com/apache/flink/pull/2891

          FLINK-5129 make the BlobServer use a distributed file system

          Previously, the BlobServer held a local copy and in case high availability (HA)
          is set, it also copied jar files to a distributed file system. Upon restore,
          these files were copied to local store from which they are used.

          This PR abstracts the BlobServer's backing file system and makes it use the
          distributed file system directly in HA mode, i.e. without the local file system
          copy. Other than that the behaviour should not change.

          Secondly, BlobCache instances at the task managers also make use of this
          distributed file system and download files from there instead of bothering
          the blob server. As before, however, distributed files may only be deleted
          by the blob server. If the distributed file system is not accessible at the blob
          caches, the old behaviour is used.

          • BlobServer: include the cluster id in the HA storage path for blobs
          • make the BlobServer use the HA filesystem back-end properly:
          • make the BlobCache also use a distributed file system in HA mode

          @uce can you have a look?

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/NicoK/flink FLINK-5129

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2891.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2891


          commit b65e74dd92bdf74b2816a0d8a26a5ebaa25ca586
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-22T11:49:03Z

          [hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath

          This was actually the same implementation as
          FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two
          could have been removed but the implementation makes most sense at the
          concrete file system abstraction layer, i.e. in FileSystemBlobStore.

          commit 09bdd49e6282268fd9c1b2672f0ea6222e097ca2
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-23T15:11:35Z

          [hotfix] do not create intermediate strings inside String.format in BlobUtils

          commit 93938ff97fef9e39c17ac795e1e89ca9de25e028
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-24T16:11:19Z

          [hotfix] properly shut down the BlobServer in BlobServerRangeTest

          commit c0c9d2239a767154d6071171d4c33e762e01aa62
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-24T17:50:43Z

          FLINK-5129 BlobServer: include the cluster id in the HA storage path for blobs

          Also use JUnit's TemporaryFolder in BlobRecoveryITCase, too. This makes
          cleaning up simpler.

          commit 8b9c7d9fd6e1ab3c7f2175a31d0e29b41b01cc61
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-23T18:50:52Z

          FLINK-5129 make the BlobCache use the HA filesystem back-end properly

          Previously, the BlobServer holds a local copy and in case high availability (HA)
          is set, it also copies jar files to a distributed file system. Upon restore,
          these files are copied to local store from which they are used.

          This commit abstracts the BlobServer's backing file system and makes it use the
          distributed file system directly in HA mode, i.e. without the local file system
          copy. Other than that the behaviour does not change.

          commit 249b2ea48f19c54498faa56ad45d299efaad4521
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-25T16:42:05Z

          FLINK-5129 make the BlobCache also use a distributed file system in HA mode

          • re-factor the file system abstraction in FileSystemBlobStore so that it can
            be used by the task managers, too, which should not be able to delete files
            in a distributed file system shared among different nodes
          • only download blobs from the blob server if not in HA mode or the distributed
            file system is not accessible by the BlobCache, e.g. at the task managers

          commit dd69f65a47205eb55ac8cc2c8f3aa9f7232dc8ba
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-28T10:42:13Z

          FLINK-5129 restore non-HA mode unique directory setup in the blob server and cache

          If not in high availability mode, local (and now also distributed) file systems
          again try to set up a unique directory structure so that other instances with
          the same configuration file or storage path do not interfere.

          This was lost in 8b9c7d9fd6.

          commit 76ccc9ffaaa63d6e0bd55ba7f6c08f8c1cff98cb
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-28T15:19:20Z

          [hotfix] add a missing "'" to FileSystemBlobStore

          commit 53702add38d1087062e84a7e804b08920dfc0c23
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2016-11-28T15:41:11Z

          FLINK-5129 move path-related methods from BlobUtils to FileSystemBlobStore and cleanup unused methods


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/2891 FLINK-5129 make the BlobServer use a distributed file system Previously, the BlobServer held a local copy and in case high availability (HA) is set, it also copied jar files to a distributed file system. Upon restore, these files were copied to local store from which they are used. This PR abstracts the BlobServer's backing file system and makes it use the distributed file system directly in HA mode, i.e. without the local file system copy. Other than that the behaviour should not change. Secondly, BlobCache instances at the task managers also make use of this distributed file system and download files from there instead of bothering the blob server. As before, however, distributed files may only be deleted by the blob server. If the distributed file system is not accessible at the blob caches, the old behaviour is used. BlobServer: include the cluster id in the HA storage path for blobs make the BlobServer use the HA filesystem back-end properly: make the BlobCache also use a distributed file system in HA mode @uce can you have a look? You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink FLINK-5129 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2891.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2891 commit b65e74dd92bdf74b2816a0d8a26a5ebaa25ca586 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-22T11:49:03Z [hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath This was actually the same implementation as FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the two could have been removed but the implementation makes most sense at the concrete file system abstraction layer, i.e. in FileSystemBlobStore. commit 09bdd49e6282268fd9c1b2672f0ea6222e097ca2 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-23T15:11:35Z [hotfix] do not create intermediate strings inside String.format in BlobUtils commit 93938ff97fef9e39c17ac795e1e89ca9de25e028 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-24T16:11:19Z [hotfix] properly shut down the BlobServer in BlobServerRangeTest commit c0c9d2239a767154d6071171d4c33e762e01aa62 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-24T17:50:43Z FLINK-5129 BlobServer: include the cluster id in the HA storage path for blobs Also use JUnit's TemporaryFolder in BlobRecoveryITCase, too. This makes cleaning up simpler. commit 8b9c7d9fd6e1ab3c7f2175a31d0e29b41b01cc61 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-23T18:50:52Z FLINK-5129 make the BlobCache use the HA filesystem back-end properly Previously, the BlobServer holds a local copy and in case high availability (HA) is set, it also copies jar files to a distributed file system. Upon restore, these files are copied to local store from which they are used. This commit abstracts the BlobServer's backing file system and makes it use the distributed file system directly in HA mode, i.e. without the local file system copy. Other than that the behaviour does not change. commit 249b2ea48f19c54498faa56ad45d299efaad4521 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-25T16:42:05Z FLINK-5129 make the BlobCache also use a distributed file system in HA mode re-factor the file system abstraction in FileSystemBlobStore so that it can be used by the task managers, too, which should not be able to delete files in a distributed file system shared among different nodes only download blobs from the blob server if not in HA mode or the distributed file system is not accessible by the BlobCache, e.g. at the task managers commit dd69f65a47205eb55ac8cc2c8f3aa9f7232dc8ba Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-28T10:42:13Z FLINK-5129 restore non-HA mode unique directory setup in the blob server and cache If not in high availability mode, local (and now also distributed) file systems again try to set up a unique directory structure so that other instances with the same configuration file or storage path do not interfere. This was lost in 8b9c7d9fd6. commit 76ccc9ffaaa63d6e0bd55ba7f6c08f8c1cff98cb Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-28T15:19:20Z [hotfix] add a missing "'" to FileSystemBlobStore commit 53702add38d1087062e84a7e804b08920dfc0c23 Author: Nico Kruber <nico@data-artisans.com> Date: 2016-11-28T15:41:11Z FLINK-5129 move path-related methods from BlobUtils to FileSystemBlobStore and cleanup unused methods

            People

            • Assignee:
              NicoK Nico Kruber
              Reporter:
              NicoK Nico Kruber
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development