Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20922

Unsafe deserialization in Spark LauncherConnection

    Details

      Description

      The run() method of the class org.apache.spark.launcher.LauncherConnection performs unsafe deserialization of data received by its socket. This makes Spark applications launched programmatically using the SparkLauncher framework potentially vulnerable to remote code execution by an attacker with access to any user account on the local machine. Such an attacker could send a malicious serialized Java object to multiple ports on the local machine, and if this port matches the one (randomly) chosen by the Spark launcher, the malicious object will be deserialized. By making use of gadget chains in code present on the Spark application classpath, the deserialization process can lead to RCE or privilege escalation.

      This vulnerability is identified by the “Unsafe deserialization” rule on lgtm.com:
      https://lgtm.com/projects/g/apache/spark/snapshot/80fdc2c9d1693f5b3402a79ca4ec76f6e422ff13/files/launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java#V58

      Attached is a proof-of-concept exploit involving a simple SparkLauncher-based application and a known gadget chain in the Apache Commons Beanutils library referenced by Spark.
      See the readme file for demonstration instructions.

        Issue Links

          Activity

          Hide
          srowen Sean Owen added a comment -

          This is not the same as https://issues.apache.org/jira/browse/SPARK-11652 I take it.

          Marcelo Vanzin can maybe comment on the mechanics here, but it also sounds like the suggestion is that this gets around the shared secret because deserializing the message itself causes something to be executed.

          Still, you have to be on the localhost to make the connection, right?

          Show
          srowen Sean Owen added a comment - This is not the same as https://issues.apache.org/jira/browse/SPARK-11652 I take it. Marcelo Vanzin can maybe comment on the mechanics here, but it also sounds like the suggestion is that this gets around the shared secret because deserializing the message itself causes something to be executed. Still, you have to be on the localhost to make the connection, right?
          Hide
          adityasharad Aditya Sharad added a comment - - edited

          Yes, this is different from SPARK-11652, which focused on preventing a specific known gadget chain (found in Commons Collections). This issue involves the general problem of unconditionally deserializing untrusted data, and the proof-of-concept is simply an example of a gadget chain (in Commons Beanutils, and which cannot be addressed by updating the dependency) that works against the latest Spark dependencies.

          I believe you are correct about the deserialization leading to code execution before the shared secret is established or checked.

          Indeed, due to how the socket is opened, you must have access to the local machine to connect, but not necessarily to the same user that is running the Spark master or task.

          Show
          adityasharad Aditya Sharad added a comment - - edited Yes, this is different from SPARK-11652 , which focused on preventing a specific known gadget chain (found in Commons Collections). This issue involves the general problem of unconditionally deserializing untrusted data, and the proof-of-concept is simply an example of a gadget chain (in Commons Beanutils, and which cannot be addressed by updating the dependency) that works against the latest Spark dependencies. I believe you are correct about the deserialization leading to code execution before the shared secret is established or checked. Indeed, due to how the socket is opened, you must have access to the local machine to connect, but not necessarily to the same user that is running the Spark master or task.
          Hide
          vanzin Marcelo Vanzin added a comment -

          Yeah, it's not as simple to exploit, but I guess we'll need custom serialization to avoid issues with 3rd-party libraries here... :-/

          Show
          vanzin Marcelo Vanzin added a comment - Yeah, it's not as simple to exploit, but I guess we'll need custom serialization to avoid issues with 3rd-party libraries here... :-/
          Hide
          apachespark Apache Spark added a comment -

          User 'vanzin' has created a pull request for this issue:
          https://github.com/apache/spark/pull/18166

          Show
          apachespark Apache Spark added a comment - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/18166
          Hide
          apachespark Apache Spark added a comment -

          User 'vanzin' has created a pull request for this issue:
          https://github.com/apache/spark/pull/18178

          Show
          apachespark Apache Spark added a comment - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/18178
          Hide
          adityasharad Aditya Sharad added a comment -

          I appreciate your quick response to this issue. I believe it would be appropriate to register a CVE ID – is that something one of you would be willing to do?

          Show
          adityasharad Aditya Sharad added a comment - I appreciate your quick response to this issue. I believe it would be appropriate to register a CVE ID – is that something one of you would be willing to do?
          Hide
          srowen Sean Owen added a comment -

          This is already publicly disclosed, which isn't a big deal because it's pretty limited in scope. Normally you'd discuss a CVE and disclosure after the fix on private@

          Show
          srowen Sean Owen added a comment - This is already publicly disclosed, which isn't a big deal because it's pretty limited in scope. Normally you'd discuss a CVE and disclosure after the fix on private@
          Hide
          adityasharad Aditya Sharad added a comment -

          Apologies for the delay in getting back to you. I believe we first got in touch privately to report this, but in future we'll discuss the details and fix on private@ first if that fits better into your workflow.

          The scope is indeed limited to attacks from local users and the issue is now publicly disclosed. However, I would argue neither of these points disqualifies the vulnerability reported here for the purposes of getting a CVE assigned.

          Depending on the configuration and the intentions of an attacker, the repercussions of this vulnerability are potentially extremely severe despite the limited scope:

          • The worst case is obviously when Spark runs as an administrative user.
          • In the more common case where Spark runs under a user account that is also responsible for other services (like Hadoop, HDFS), the repercussions can be very severe. This is the case in the default Cloudera setup, for example. In that particular scenario, an attacker can cause a widespread outage by simply wiping all data that belongs to the 'hdfs' user. The repercussions reach far beyond Spark itself.
          • In the 'best' case, Spark is set up to use a dedicated user account. Here we're looking at a DoS to Spark specifically, with a severe risk for data loss. An attacker can stop the service and wipe all of Spark's data.

          We have seen significantly less severe vulnerabilities for which a CVE is assigned. The prime reasons for doing so are to advise users and to maintain a visible record of the issue that isn't project-specific, which I think would be appropriate in this case.

          Please let me know if there's anything I can help with. I am willing to file separately for the CVE if that is easier, but I do not wish to do so without first having your agreement and finding out if Spark has a preferred CVE route. If you'd like to discuss this further off-list, please feel free to contact me on aditya@semmle.com.

          Show
          adityasharad Aditya Sharad added a comment - Apologies for the delay in getting back to you. I believe we first got in touch privately to report this, but in future we'll discuss the details and fix on private@ first if that fits better into your workflow. The scope is indeed limited to attacks from local users and the issue is now publicly disclosed. However, I would argue neither of these points disqualifies the vulnerability reported here for the purposes of getting a CVE assigned. Depending on the configuration and the intentions of an attacker, the repercussions of this vulnerability are potentially extremely severe despite the limited scope: The worst case is obviously when Spark runs as an administrative user. In the more common case where Spark runs under a user account that is also responsible for other services (like Hadoop, HDFS), the repercussions can be very severe. This is the case in the default Cloudera setup, for example. In that particular scenario, an attacker can cause a widespread outage by simply wiping all data that belongs to the 'hdfs' user. The repercussions reach far beyond Spark itself. In the 'best' case, Spark is set up to use a dedicated user account. Here we're looking at a DoS to Spark specifically, with a severe risk for data loss. An attacker can stop the service and wipe all of Spark's data. We have seen significantly less severe vulnerabilities for which a CVE is assigned. The prime reasons for doing so are to advise users and to maintain a visible record of the issue that isn't project-specific, which I think would be appropriate in this case. Please let me know if there's anything I can help with. I am willing to file separately for the CVE if that is easier, but I do not wish to do so without first having your agreement and finding out if Spark has a preferred CVE route. If you'd like to discuss this further off-list, please feel free to contact me on aditya@semmle.com.
          Hide
          srowen Sean Owen added a comment -

          If you'd email a suggested CVE description to private@spark.apache.org, we can go through the motions of reporting it as one. The ASF process is: https://www.apache.org/security/ https://www.apache.org/security/projects.html

          Show
          srowen Sean Owen added a comment - If you'd email a suggested CVE description to private@spark.apache.org, we can go through the motions of reporting it as one. The ASF process is: https://www.apache.org/security/ https://www.apache.org/security/projects.html
          Hide
          srowen Sean Owen added a comment -

          This came up again today and our security folks also suggested this should be a CVE. I can work on this but feel free to supply text as a summary.

          Show
          srowen Sean Owen added a comment - This came up again today and our security folks also suggested this should be a CVE. I can work on this but feel free to supply text as a summary.

            People

            • Assignee:
              vanzin Marcelo Vanzin
              Reporter:
              adityasharad Aditya Sharad
              Shepherd:
              Sean Owen
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development