Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33569

Could not deploy yarn-application when using yarn over s3a filesystem.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 1.18.0, 1.17.1
    • None
    • Deployment / YARN
    • None

    Description

       

      I now use the `yarn-application` mode to deploy Flink. I found that when I set Hadoop's storage to the s3a file system, Flink could not submit tasks to Yarn.
      The error is reported as follows:

      ------------------------------------------------------------
      
       The program finished with the following exception:
      
      
      
      org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
      
              at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:481)
      
              at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
      
              at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:212)
      
              at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1098)
      
              at org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
      
              at java.security.AccessController.doPrivileged(Native Method)
      
              at javax.security.auth.Subject.doAs(Subject.java:422)
      
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
      
              at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
      
              at org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
      
              at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
      
      Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path for URI:file:///tmp/application_1700122774429_0001-flink-conf.yaml5526160496134930395.tmp': Input/output error
      
              at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:360)
      
              at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOperation.java:222)
      
              at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.execute(CopyFromLocalOperation.java:169)
      
              at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$copyFromLocalFile$26(S3AFileSystem.java:3854)
      
              at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
      
              at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
      
              at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
      
              at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480)
      
              at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499)
      
              at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFromLocalFile(S3AFileSystem.java:3847)
      
              at org.apache.flink.yarn.YarnApplicationFileUploader.copyToRemoteApplicationDir(YarnApplicationFileUploader.java:397)
      
              at org.apache.flink.yarn.YarnApplicationFileUploader.uploadLocalFileToRemote(YarnApplicationFileUploader.java:202)
      
              at org.apache.flink.yarn.YarnApplicationFileUploader.registerSingleLocalResource(YarnApplicationFileUploader.java:181)
      
              at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1050)
      
              at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:626)
      
              at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:474)
      
              ... 10 more
      
      
       

      I found by looking through the source code and debugging that when Hadoop uses the s3a file system, uploading and downloading files must use URIs with `scheme` to build path parameters.

      In the `org.apache.flink.yarn.YarnClusterDescriptor` class, when uploading a temporarily generated `yaml` configuration file, the absolute path of the file is used instead of the URI as the path construction parameter, but other file upload and download behaviors They all use URI as the path parameter.

      This is the reason for the error reported above.

      Attachments

        1. image-2023-11-16-16-48-40-223.png
          145 kB
          Bodong Liu
        2. image-2023-11-16-16-46-21-684.png
          187 kB
          Bodong Liu
        3. 2023-11-16_16-47.png
          189 kB
          Bodong Liu

        Issue Links

          Activity

            People

              Unassigned Unassigned
              liubodong Bodong Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: