Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-31612

ClassNotFoundException when using GCS path as HA directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.17.0
    • 1.18.0, 1.17.1
    • FileSystems
    • Flink Kuberenetes operator: 1.4

      Flink version: 1.17

      GKE Kubernetes cluster.

       

    Description

      Hi,

      When I am trying to run Flink job in HA mode with GCS path as a HA directory (eg: [gs://flame-poc/ha]) or while starting a job from checkpoints in GCS I am getting following exception:

      Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not found
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688) ~[?:?]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2712) ~[?:?]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.<init>(Groups.java:107) ~[?:?]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.<init>(Groups.java:102) ~[?:?]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:451) ~[?:?]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:338) ~[?:?]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) ~[?:?]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575) ~[?:?]
      	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getUgiUserName(GoogleHadoopFileSystemBase.java:1226) ~[?:?]
      	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.listStatus(GoogleHadoopFileSystemBase.java:858) ~[?:?]
      	at org.apache.flink.fs.gs.org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.listStatus(HadoopFileSystem.java:170) ~[?:?] 

      Observations:

      While using File system as a HA path and GCS as checkpointing directory the job is able to write checkpoints to GCS checkpoint path. 

      After debugging what I found was all the org.apache.hadoop paths are shaded to org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop. Ideally the code should look for  org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback instead of  org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.
      I think it is not getting shaded over here due to reflection being used here:
      https://github.com/apache/hadoop/blob/branch-3.3.4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java#L108

      As a workaround I rebuilt flink-gs-fs-hadoop plugin removing this relocation and it worked for me.

      <relocation>
      <pattern>org.apache.hadoop</pattern>
      <shadedPattern>org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop</shadedPattern>
      </relocation> 

      Attachments

        Activity

          People

            martijnvisser Martijn Visser
            mohit.aggarwal Mohit Aggarwal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: