Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14044

Synchronization issue in delegation token cancel functionality

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
    • Component/s: None
    • Labels:
      None

      Description

      We are using Hadoop delegation token authentication functionality in Apache Solr. As part of the integration testing, I found following issue with the delegation token cancelation functionality.

      Consider a setup with 2 Solr servers (S1 and S2) which are configured to use delegation token functionality backed by Zookeeper. Now invoke following steps,

      [Step 1] Send a request to S1 to create a delegation token.
      (Delegation token DT is created successfully)
      [Step 2] Send a request to cancel DT to S2
      (DT is canceled successfully. client receives HTTP 200 response)
      [Step 3] Send a request to cancel DT to S2 again
      (DT cancelation fails. client receives HTTP 404 response)
      [Step 4] Send a request to cancel DT to S1

      At this point we get two different responses.

      • DT cancelation fails. client receives HTTP 404 response
      • DT cancelation succeeds. client receives HTTP 200 response

      Also as per the current implementation, each server maintains an in_memory cache of current tokens which is updated using the ZK watch mechanism. e.g. the ZK watch on S1 will ensure that the in_memory cache is synchronized after step 2.

      After investigation, I found the root cause for this behavior is due to the race condition between step 4 and the firing of ZK watch on S1. Whenever the watch fires before the step 4 - we get HTTP 404 response (as expected). When that is not the case - we get HTTP 200 response along with following ERROR message in the log,

      Attempted to remove a non-existing znode /ZKDTSMTokensRoot/DT_XYZ
      

      From client perspective, the server should return HTTP 404 error when the cancel request is sent out for an invalid token.

      Ref: Here is the relevant Solr unit test for reference,
      https://github.com/apache/lucene-solr/blob/746786636404cdb8ce505ed0ed02b8d9144ab6c4/solr/core/src/test/org/apache/solr/cloud/TestSolrCloudWithDelegationTokens.java#L285

        Attachments

        1. dt_fail.log
          5 kB
          Hrishikesh Gadre
        2. dt_success.log
          4 kB
          Hrishikesh Gadre
        3. HADOOP-14044-001.patch
          4 kB
          Hrishikesh Gadre
        4. HADOOP-14044-002.patch
          2 kB
          Hrishikesh Gadre
        5. HADOOP-14044-003.patch
          2 kB
          Hrishikesh Gadre

          Issue Links

            Activity

              People

              • Assignee:
                hgadre Hrishikesh Gadre
                Reporter:
                hgadre Hrishikesh Gadre
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: