Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-3
Description
Data load fails time to time with the following error:
00:27:17.680 Error loading data. The end of the log file is: 00:27:17.680 04:15:15 /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/load-data.py --workloads functional-query -e core --table_formats kudu/none/none --force --impalad localhost --hive_hs2_hostport localhost:11050 --hdfs_namenode localhost:20500 00:27:17.680 04:15:15 Executing Hadoop command: ... hadoop credential create openai-api-key-secret -value secret -provider localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks ... 00:27:17.680 java.io.IOException: Credential openai-api-key-secret already exists in localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks 00:27:17.680 at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.createCredentialEntry(AbstractJavaKeyStoreProvider.java:234) 00:27:17.680 at org.apache.hadoop.security.alias.CredentialShell$CreateCommand.execute(CredentialShell.java:354) 00:27:17.680 at org.apache.hadoop.tools.CommandShell.run(CommandShell.java:72) 00:27:17.680 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81) 00:27:17.680 at org.apache.hadoop.security.alias.CredentialShell.main(CredentialShell.java:437) 00:27:17.680 04:15:15 Error executing Hadoop command, exiting
My guess is that this happens when calling "hadoop credential create" concurrently with different data loader processes.
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/bin/load-data.py#L323
Ideally this would be called in the serial phase of dataload
Attachments
Issue Links
- duplicates
-
IMPALA-13015 Dataload fails due to concurrency issue with test.jceks
- Resolved
- is broken by
-
IMPALA-12920 Support ai_generate_text built-in function for OpenAI's LLMs
- Resolved