Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.17.0
-
Flink Kuberenetes operator: 1.4
Flink version: 1.17
GKE Kubernetes cluster.
Description
Hi,
When I am trying to run Flink job in HA mode with GCS path as a HA directory (eg: [gs://flame-poc/ha]) or while starting a job from checkpoints in GCS I am getting following exception:
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not found
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688) ~[?:?]
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2712) ~[?:?]
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.<init>(Groups.java:107) ~[?:?]
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.<init>(Groups.java:102) ~[?:?]
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:451) ~[?:?]
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:338) ~[?:?]
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) ~[?:?]
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575) ~[?:?]
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getUgiUserName(GoogleHadoopFileSystemBase.java:1226) ~[?:?]
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.listStatus(GoogleHadoopFileSystemBase.java:858) ~[?:?]
at org.apache.flink.fs.gs.org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.listStatus(HadoopFileSystem.java:170) ~[?:?]
Observations:
While using File system as a HA path and GCS as checkpointing directory the job is able to write checkpoints to GCS checkpoint path.
After debugging what I found was all the org.apache.hadoop paths are shaded to org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop. Ideally the code should look for org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback instead of org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.
I think it is not getting shaded over here due to reflection being used here:
https://github.com/apache/hadoop/blob/branch-3.3.4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java#L108
As a workaround I rebuilt flink-gs-fs-hadoop plugin removing this relocation and it worked for me.
<relocation> <pattern>org.apache.hadoop</pattern> <shadedPattern>org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop</shadedPattern> </relocation>