[HADOOP-14044] Synchronization issue in delegation token cancel functionality - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
Component/s: None
Labels:
None

Target Version/s:

2.9.0, 3.0.0-alpha4
Hadoop Flags:

Reviewed

Description

We are using Hadoop delegation token authentication functionality in Apache Solr. As part of the integration testing, I found following issue with the delegation token cancelation functionality.

Consider a setup with 2 Solr servers (S1 and S2) which are configured to use delegation token functionality backed by Zookeeper. Now invoke following steps,

[Step 1] Send a request to S1 to create a delegation token.
(Delegation token DT is created successfully)
[Step 2] Send a request to cancel DT to S2
(DT is canceled successfully. client receives HTTP 200 response)
[Step 3] Send a request to cancel DT to S2 again
(DT cancelation fails. client receives HTTP 404 response)
[Step 4] Send a request to cancel DT to S1

At this point we get two different responses.

DT cancelation fails. client receives HTTP 404 response
DT cancelation succeeds. client receives HTTP 200 response

Also as per the current implementation, each server maintains an in_memory cache of current tokens which is updated using the ZK watch mechanism. e.g. the ZK watch on S1 will ensure that the in_memory cache is synchronized after step 2.

After investigation, I found the root cause for this behavior is due to the race condition between step 4 and the firing of ZK watch on S1. Whenever the watch fires before the step 4 - we get HTTP 404 response (as expected). When that is not the case - we get HTTP 200 response along with following ERROR message in the log,

Attempted to remove a non-existing znode /ZKDTSMTokensRoot/DT_XYZ

From client perspective, the server should return HTTP 404 error when the cancel request is sent out for an invalid token.

Ref: Here is the relevant Solr unit test for reference,
https://github.com/apache/lucene-solr/blob/746786636404cdb8ce505ed0ed02b8d9144ab6c4/solr/core/src/test/org/apache/solr/cloud/TestSolrCloudWithDelegationTokens.java#L285

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

dt_fail.log
31/Jan/17 20:51
5 kB
Hrishikesh Gadre
dt_success.log
31/Jan/17 20:51
4 kB
Hrishikesh Gadre
HADOOP-14044-001.patch
31/Jan/17 22:19
4 kB
Hrishikesh Gadre
HADOOP-14044-002.patch
02/Feb/17 18:53
2 kB
Hrishikesh Gadre
HADOOP-14044-003.patch
03/Feb/17 00:08
2 kB
Hrishikesh Gadre

Issue Links

blocks

SOLR-10053 TestSolrCloudWithDelegationTokens failures

Resolved

Activity

People

Assignee:: Hrishikesh Gadre

Reporter:: Hrishikesh Gadre

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 31/Jan/17 20:49

Updated:: 22/Feb/18 18:30

Resolved:: 04/Feb/17 01:29