Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.17.0, 1.15.3, 1.16.1
-
None
-
None
Description
The following exception is thrown when using python udf in user job:
Caused by: java.io.IOException: Cannot run program "ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored. /mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/pyflink-udf-runner.sh": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:147) at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:122) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:106) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) ... 19 more Suppressed: java.lang.NullPointerException: Process for id does not exist: 1-1 at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:895) at org.apache.beam.runners.fnexecution.environment.ProcessManager.stopProcess(ProcessManager.java:172) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:126) ... 29 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 32 more
This is because SRE introduce a environment param
LD_PRELOAD=/usr/lib64/libjemalloc.so.1
The logic of the python process itself can be executed normally, but an extra error message will be printed. So the whole output looks like:
ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored.
/mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/
And the whole output is treated as a command, which caused the exception.
It seems the output is not very reliable. Maybe we need to find another way to transfer data, or filter the output before using.