[SPARK-24627] [Spark2.3.0] After HDFS Token expire kinit not able to submit job using beeline - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed
Environment:

OS: SUSE11

Spark Version: 2.3.0

Description

Steps:

beeline session was active.
1.Launch spark-beeline
2. create table alt_s1 (time timestamp, name string, isright boolean, datetoday date, num binary, height double, score float, decimaler decimal(10,0), id tinyint, age int, license bigint, length smallint) row format delimited fields terminated by ',';
3. load data local inpath '/opt/typeddata60.txt' into table alt_s1;
4. show tables;( Table listed successfully )
5. select * from alt_s1;
Throws HDFS_DELEGATION_TOKEN Exception

0: jdbc:hive2://10.18.18.214:23040/default> select * from alt_s1;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 22.0 failed 4 times, most recent failure: Lost task 1.3 in stage 22.0 (TID 106, blr1000023110, executor 1): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 7 for spark) can't be found in cache
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)
at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:257)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:256)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

*Note: Even after kinit spark/hadoop token is not getting renewed.*

Now Launch spark sql session ( Select * from alt_s1 ) is successful.
1. Launch spark-sql
2.spark-sql> select * from alt_s1;
2018-06-22 14:24:04 INFO HiveMetaStore:746 - 0: get_table : db=test_one tbl=alt_s1
2018-06-22 14:24:04 INFO audit:371 - ugi=spark/hadoop@HADOOP.COM ip=unknown-ip-addr cmd=get_table : db=test_one tbl=alt_s1
2018-06-22 14:24:04 INFO SQLStdHiveAccessController:95 - Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=2cf6aac4-91c6-4c2d-871b-4d7620d91f43, clientType=HIVECLI]
2018-06-22 14:24:04 INFO metastore:291 - Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
2018-06-22 14:24:04 INFO HiveMetaStore:746 - 0: Shutting down the object store...
2018-06-22 14:24:04 INFO audit:371 - ugi=spark/hadoop@HADOOP.COM ip=unknown-ip-addr cmd=Shutting down the object store...
2018-06-22 14:24:04 INFO HiveMetaStore:746 - 0: Metastore shutdown complete.
2018-06-22 14:24:04 INFO audit:371 - ugi=spark/hadoop@HADOOP.COM ip=unknown-ip-addr cmd=Metastore shutdown complete.
2018-06-22 14:24:04 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 437.9 KB, free 365.9 MB)
2018-06-22 14:24:04 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 365.8 MB)
2018-06-22 14:24:04 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on blr1000023111:45858 (size: 33.6 KB, free: 366.3 MB)
2018-06-22 14:24:04 INFO SparkContext:54 - Created broadcast 0 from
2018-06-22 14:24:05 INFO DFSClient:1033 - Created HDFS_DELEGATION_TOKEN token 42 for spark on ha-hdfs:hacluster
2018-06-22 14:24:05 INFO TokenCache:125 - Got dt for hdfs://hacluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 42 for spark)
2018-06-22 14:24:05 INFO FileInputFormat:249 - Total input paths to process : 1
2018-06-22 14:24:05 INFO SparkContext:54 - Starting job: processCmd at CliDriver.java:376
2018-06-22 14:24:05 INFO DAGScheduler:54 - Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
2018-06-22 14:24:05 INFO DAGScheduler:54 - Final stage: ResultStage 0 (processCmd at CliDriver.java:376)
2018-06-22 14:24:05 INFO DAGScheduler:54 - Parents of final stage: List()
2018-06-22 14:24:05 INFO DAGScheduler:54 - Missing parents: List()
2018-06-22 14:24:05 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376), which has no missing parents
2018-06-22 14:24:05 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 10.6 KB, free 365.8 MB)
2018-06-22 14:24:05 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 365.8 MB)
2018-06-22 14:24:05 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on blr1000023111:45858 (size: 5.3 KB, free: 366.3 MB)
2018-06-22 14:24:05 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1039
2018-06-22 14:24:05 INFO ContextCleaner:54 - Cleaned accumulator 0
2018-06-22 14:24:05 INFO DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0))
2018-06-22 14:24:05 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 1 tasks
2018-06-22 14:24:05 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 7923 bytes)
2018-06-22 14:24:05 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-06-22 14:24:05 INFO HadoopRDD:54 - Input split: hdfs://hacluster/user/sparkhive/warehouse/test_one.db/alt_s1/typeddata60.txt:0+4802
2018-06-22 14:24:05 INFO CodeGenerator:54 - Code generated in 232.938395 ms
2018-06-22 14:24:06 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 2221 bytes result sent to driver
2018-06-22 14:24:06 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 660 ms on localhost (executor driver) (1/1)
2018-06-22 14:24:06 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool
2018-06-22 14:24:06 INFO DAGScheduler:54 - ResultStage 0 (processCmd at CliDriver.java:376) finished in 0.769 s
2018-06-22 14:24:06 INFO DAGScheduler:54 - Job 0 finished: processCmd at CliDriver.java:376, took 0.829488 s
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 1
2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 2
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 3
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 4
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 5
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 6
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 7
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 8
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 9
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 10
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 11
2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 12
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 13
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 14
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 15
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 16
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 17
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 18
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 19
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 20
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 21
2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 22
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 23
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 24
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 25
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 26
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 27
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 28
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 29
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 30
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 31
2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 32
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 33
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 34
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 35
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 36
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 37
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 38
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 39
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 40
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 41
2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 42
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 43
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 44
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 45
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 46
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 47
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 48
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 49
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 50
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 51
2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 52
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 53
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 54
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 55
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 56
2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 57
2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 58
2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 59
2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 60

Below is configuration in hdfs-site.xml
<property>
<name>dfs.namenode.delegation.key.update-interval</name>
<value>18000000</value>
</property>
<property>
<name>dfs.namenode.delegation.token.max-lifetime</name>
<value>60000000</value>
</property>
<property>

Insert gives job aborted Exception:

0: jdbc:hive2://10.18.18.214:23040/default> show tables;
---------------------------------+

database

tableName

isTemporary

---------------------------------+

test_one	alt_s1	false
test_one	csvtable	false
test_one	csvtable1	false
test_one	csvtable2	false
test_one	emp	false
test_one	tab1	false
test_one	tab2	false

---------------------------------+
7 rows selected (0.049 seconds)
0: jdbc:hive2://10.18.18.214:23040/default> insert into emp values( 34 );
Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)
0: jdbc:hive2://10.18.18.214:23040/default>

JDBC Server Logs snapshot

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 7 for spark) can't be found in cache
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
... 16 more
2018-06-22 14:53:22,557 | ERROR | [HiveServer2-Background-Pool: Thread-2966] | Error running hive query: | org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:179)
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:269)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

[Spark2.3.0] After HDFS Token expire kinit not able to submit job using beeline

Details

Description

Attachments

Activity

People

Dates