Details

      Description

      In HADOOP-10603, we have an implementation of AES-CTR CryptoCodec using Java JCE provider.
      To get high performance, the configured JCE provider should utilize native code and AES-NI, but in JDK6,7 the Java embedded provider doesn't support it.

      Considering not all hadoop user will use the provider like Diceros or able to get signed certificate from oracle to develop a custom provider, so this JIRA will have an implementation of AES-CTR CryptoCodec using JNI to OpenSSL directly.

      1. HADOOP-10693.patch
        41 kB
        Yi Liu
      2. HADOOP-10693.1.patch
        41 kB
        Yi Liu
      3. HADOOP-10693.2.patch
        45 kB
        Yi Liu
      4. HADOOP-10693.3.patch
        58 kB
        Yi Liu
      5. HADOOP-10693.4.patch
        58 kB
        Yi Liu
      6. HADOOP-10693.5.patch
        59 kB
        Yi Liu
      7. HADOOP-10693.6.patch
        78 kB
        Yi Liu
      8. HADOOP-10693.7.patch
        78 kB
        Yi Liu
      9. HADOOP-10693.8.patch
        78 kB
        Yi Liu

        Activity

        Hide
        Colin Patrick McCabe added a comment -

        committed to fs-encryption branch

        Show
        Colin Patrick McCabe added a comment - committed to fs-encryption branch
        Hide
        Colin Patrick McCabe added a comment -

        +1. Thanks, Yi.

        Show
        Colin Patrick McCabe added a comment - +1. Thanks, Yi.
        Hide
        Yi Liu added a comment -

        Thanks Colin Patrick McCabe, I have update the patch.

        Show
        Yi Liu added a comment - Thanks Colin Patrick McCabe , I have update the patch.
        Hide
        Colin Patrick McCabe added a comment -
        +static int loadAesCtr(JNIEnv *env)
        +{
        +#ifdef UNIX
        +  dlerror(); // Clear any existing error
        +  dlsym_EVP_aes_256_ctr = dlsym(openssl, "EVP_aes_256_ctr");
        +  dlsym_EVP_aes_128_ctr = dlsym(openssl, "EVP_aes_128_ctr");
        +  if (dlerror() != NULL) {
        +    return -1;
        +  }
        +#endif
        +
        +#ifdef WINDOWS
        +  dlsym_EVP_aes_256_ctr = (__dlsym_EVP_aes_256_ctr) GetProcAddress(openssl,  \
        +      "EVP_aes_256_ctr");
        +  dlsym_EVP_aes_128_ctr = (__dlsym_EVP_aes_128_ctr) GetProcAddress(openssl,  \
        +      "EVP_aes_128_ctr");
        +  if (dlsym_EVP_aes_256_ctr == NULL || dlsym_EVP_aes_128_ctr == NULL) {
        +    return -1;
        +  }
        +#endif
        +  
        +  return 0;
        +}
        

        If the first call to dlsym fails, the second call will clear the dlerror state. So this isn't quite going to work, I think.
        I think it would be easier to just use the LOAD_DYNAMIC_SYMBOL macro, and then check for the exception afterwards. You'd need something like this:

        void loadAes(void)
        {
            LOAD_DYNAMIC_SYMBOL(1...)
            LOAD_DYNAMIC_SYMBOL(2...)
        }
        
        JNIEXPORT void JNICALL Java_org_apache_hadoop_crypto_OpensslCipher_initIDs
            (JNIEnv *env, jclass clazz)
        {
            loadAes();
            jthrowable jthr = (*env)->ExceptionOccurred();
            if (jthr) {
                (*env)->DeleteLocalRef(env, jthr);
                THROW(...)
                return;
            }
            ...
        }
        

        Or something like that. +1 once this is addressed

        Show
        Colin Patrick McCabe added a comment - + static int loadAesCtr(JNIEnv *env) +{ +#ifdef UNIX + dlerror(); // Clear any existing error + dlsym_EVP_aes_256_ctr = dlsym(openssl, "EVP_aes_256_ctr" ); + dlsym_EVP_aes_128_ctr = dlsym(openssl, "EVP_aes_128_ctr" ); + if (dlerror() != NULL) { + return -1; + } +#endif + +#ifdef WINDOWS + dlsym_EVP_aes_256_ctr = (__dlsym_EVP_aes_256_ctr) GetProcAddress(openssl, \ + "EVP_aes_256_ctr" ); + dlsym_EVP_aes_128_ctr = (__dlsym_EVP_aes_128_ctr) GetProcAddress(openssl, \ + "EVP_aes_128_ctr" ); + if (dlsym_EVP_aes_256_ctr == NULL || dlsym_EVP_aes_128_ctr == NULL) { + return -1; + } +#endif + + return 0; +} If the first call to dlsym fails, the second call will clear the dlerror state. So this isn't quite going to work, I think. I think it would be easier to just use the LOAD_DYNAMIC_SYMBOL macro, and then check for the exception afterwards. You'd need something like this: void loadAes(void) { LOAD_DYNAMIC_SYMBOL(1...) LOAD_DYNAMIC_SYMBOL(2...) } JNIEXPORT void JNICALL Java_org_apache_hadoop_crypto_OpensslCipher_initIDs (JNIEnv *env, jclass clazz) { loadAes(); jthrowable jthr = (*env)->ExceptionOccurred(); if (jthr) { (*env)->DeleteLocalRef(env, jthr); THROW(...) return ; } ... } Or something like that. +1 once this is addressed
        Hide
        Yi Liu added a comment -

        Colin Patrick McCabe, thank you for the review. I update the patch based on your new comments and the discussion which we talked offline.

        I guess my question here is, if I compile against openssl 1.0.0 and run against 1.0.1, does AES-CTR work? My understanding is that it does. So we should not fail the build just because the compiler has version 1.0.0.

        In the new patch, I remove openssl version (1.0.0 or 1.0.1) check in CMakeLists.txt, now it can be compiled against openssl 1.0.0 and run against 1.0.1.

        As we talked about, we should fail the tests when buildSupportsOpenssl is true but openssl is not working (that way, we will know we have a configuration problem on Jenkins or any other build system.)

        Yes, this has been included.

        Specifically, we should call const char *SSLeay_version(int t); here and throw an exception if the number is too low. We should not use the #define, since that is the version we compiled with, which may not be the same as the version we're running with. (In fact, it rarely will be the same, due to the security and export control difficulties associated with bundling openssl.)

        Basically dlsym for aes-ctr related symbols will fail if the Openssl version is not new enough, so we don’t need to check the version specifically. And I refine the error message to:
        Cannot find AES-CTR support, is your version of Openssl new enough?

        Show
        Yi Liu added a comment - Colin Patrick McCabe , thank you for the review. I update the patch based on your new comments and the discussion which we talked offline. I guess my question here is, if I compile against openssl 1.0.0 and run against 1.0.1, does AES-CTR work? My understanding is that it does. So we should not fail the build just because the compiler has version 1.0.0. In the new patch, I remove openssl version (1.0.0 or 1.0.1) check in CMakeLists.txt, now it can be compiled against openssl 1.0.0 and run against 1.0.1. As we talked about, we should fail the tests when buildSupportsOpenssl is true but openssl is not working (that way, we will know we have a configuration problem on Jenkins or any other build system.) Yes, this has been included. Specifically, we should call const char *SSLeay_version(int t); here and throw an exception if the number is too low. We should not use the #define, since that is the version we compiled with, which may not be the same as the version we're running with. (In fact, it rarely will be the same, due to the security and export control difficulties associated with bundling openssl.) Basically dlsym for aes-ctr related symbols will fail if the Openssl version is not new enough, so we don’t need to check the version specifically. And I refine the error message to: Cannot find AES-CTR support, is your version of Openssl new enough?
        Hide
        Colin Patrick McCabe added a comment -

        Right? (We can't make precondition on buildSupportsOpenssl directly, it will fail if build without Openssl) Now I add it in the new patch.

        Sounds good.

        Furthermore, in linux the shared library for OpenSSL 1.0.0x and 1.0.1x both are “libcrypto.so.1.0.0”, but AES-CTR is only supported in 1.0.1x. So I add following in the new patch to check the detailed version (we are not able to get whether its 1.0.0x and 1.0.1x from the suffix):

        I guess my question here is, if I compile against openssl 1.0.0 and run against 1.0.1, does AES-CTR work? My understanding is that it does. So we should not fail the build just because the compiler has version 1.0.0.

        As we talked about, we should fail the tests when buildSupportsOpenssl is true but openssl is not working (that way, we will know we have a configuration problem on Jenkins or any other build system.)

        +JNIEXPORT void JNICALL Java_org_apache_hadoop_crypto_OpensslCipher_initIDs
        +    (JNIEnv *env, jclass clazz)
        

        Specifically, we should call const char *SSLeay_version(int t); here and throw an exception if the number is too low. We should not use the #define, since that is the version we compiled with, which may not be the same as the version we're running with. (In fact, it rarely will be the same, due to the security and export control difficulties associated with bundling openssl.)

        Show
        Colin Patrick McCabe added a comment - Right? (We can't make precondition on buildSupportsOpenssl directly, it will fail if build without Openssl) Now I add it in the new patch. Sounds good. Furthermore, in linux the shared library for OpenSSL 1.0.0x and 1.0.1x both are “libcrypto.so.1.0.0”, but AES-CTR is only supported in 1.0.1x. So I add following in the new patch to check the detailed version (we are not able to get whether its 1.0.0x and 1.0.1x from the suffix): I guess my question here is, if I compile against openssl 1.0.0 and run against 1.0.1, does AES-CTR work? My understanding is that it does. So we should not fail the build just because the compiler has version 1.0.0. As we talked about, we should fail the tests when buildSupportsOpenssl is true but openssl is not working (that way, we will know we have a configuration problem on Jenkins or any other build system.) +JNIEXPORT void JNICALL Java_org_apache_hadoop_crypto_OpensslCipher_initIDs + (JNIEnv *env, jclass clazz) Specifically, we should call const char *SSLeay_version(int t); here and throw an exception if the number is too low. We should not use the #define , since that is the version we compiled with, which may not be the same as the version we're running with. (In fact, it rarely will be the same, due to the security and export control difficulties associated with bundling openssl.)
        Hide
        Yi Liu added a comment -

        Thanks Alejandro Abdelnur, at runtime, we had already checked whether the loaded openssl library supports AES-CTR, if not, will cause OpensslCipher.isNativeCodeLoaded() false.

        Now that snippet of cmake script makes sure correct Openssl version is chosen for compiling.

        Show
        Yi Liu added a comment - Thanks Alejandro Abdelnur , at runtime, we had already checked whether the loaded openssl library supports AES-CTR, if not, will cause OpensslCipher.isNativeCodeLoaded() false. Now that snippet of cmake script makes sure correct Openssl version is chosen for compiling.
        Hide
        Alejandro Abdelnur added a comment -

        If openssl exists, but the version is not sufficient, OpensslCipher.c will not be compiled. If specifying require.openssl, but the version is not sufficient, build will fail with error message “Openssl version is too low, should be at least 1.0.1”.

        Can we also do this check at runtime? or binding will simply fail?

        Show
        Alejandro Abdelnur added a comment - If openssl exists, but the version is not sufficient, OpensslCipher.c will not be compiled. If specifying require.openssl, but the version is not sufficient, build will fail with error message “Openssl version is too low, should be at least 1.0.1”. Can we also do this check at runtime? or binding will simply fail?
        Hide
        Yi Liu added a comment -

        Colin Patrick McCabe, Thanks for the review.

        I was just asking you to change the type of the variables to unsigned, and then you won't need any casts. It seems that you dropped the cast, but didn't change the type to unsigned.

        My bad, I misunderstood your meaning. I update it.

        It's good that you changed the name of the test, but there are still some other class names we should fix. Since the other stuff shows up in config XMLs, it's probably even more important to use something readable.

        Oh, you mean this, now I update all of them in the new patch. I thought they were checked in already and changing the name was big thing, so I only changed the name in Test.
        AESCTRCryptoCodec -> AesCtrCryptoCodec
        JCEAESCTRCryptoCodec -> JceAesCtrCryptoCodec
        OpensslAesCtrCryptoCodec
        OpensslCipher
        TestCryptoStreamsWithOpensslAesCtrCryptoCodec

        I added the "require" flags since previously we were having so much trouble ensuring our builds contained what they were supposed to contain.
        There is a way to pass java properties on to Surefire, but it's finicky. A simpler approach might just be to do something like this:

        Preconditions.checkState(NativeCodeLoader.buildSupportsOpenssl());
        Assert.assertTrue(OpenSSLCipher.isNativeCodeLoaded());
        

        We need to make sure that the build fails when there is an issue loading libcrypto.so on Jenkins. Otherwise, we can never be sure about our test coverage.

        I get your point. And you mean

          if (NativeCodeLoader.buildSupportsOpenssl()) {
            Assert.assertTrue(OpensslCipher.isNativeCodeLoaded());
          }
        

        Right? (We can't make precondition on buildSupportsOpenssl directly, it will fail if build without Openssl) Now I add it in the new patch.

        Furthermore, in linux the shared library for OpenSSL 1.0.0x and 1.0.1x both are “libcrypto.so.1.0.0”, but AES-CTR is only supported in 1.0.1x. So I add following in the new patch to check the detailed version (we are not able to get whether its 1.0.0x and 1.0.1x from the suffix):

          if (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR)
            file(STRINGS "${OPENSSL_INCLUDE_DIR}/openssl/opensslv.h" OPENSSL_VERSION_STR REGEX "^#define[\t ]+OPENSSL_VERSION_NUMBER[\t ]+0x[0-9][0-9][0-9][0-9][0-9][0-9].*")
            string(REGEX MATCH 0x[0-9][0-9][0-9][0-9][0-9][0-9]..L OPENSSL_VERSION "${OPENSSL_VERSION_STR}")
            IF(${OPENSSL_VERSION} LESS 0x1000100fL)
              SET(OPENSSL_VERSION_MATCHED FALSE)
            ELSE()
              SET(OPENSSL_VERSION_MATCHED TRUE)
            ENDIF()
          endif (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR)
        

        If openssl exists, but the version is not sufficient, OpensslCipher.c will not be compiled. If specifying require.openssl, but the version is not sufficient, build will fail with error message “Openssl version is too low, should be at least 1.0.1”.

        Show
        Yi Liu added a comment - Colin Patrick McCabe , Thanks for the review. I was just asking you to change the type of the variables to unsigned, and then you won't need any casts. It seems that you dropped the cast, but didn't change the type to unsigned. My bad, I misunderstood your meaning. I update it. It's good that you changed the name of the test, but there are still some other class names we should fix. Since the other stuff shows up in config XMLs, it's probably even more important to use something readable. Oh, you mean this , now I update all of them in the new patch. I thought they were checked in already and changing the name was big thing, so I only changed the name in Test. AESCTRCryptoCodec -> AesCtrCryptoCodec JCEAESCTRCryptoCodec -> JceAesCtrCryptoCodec OpensslAesCtrCryptoCodec OpensslCipher TestCryptoStreamsWithOpensslAesCtrCryptoCodec I added the "require" flags since previously we were having so much trouble ensuring our builds contained what they were supposed to contain. There is a way to pass java properties on to Surefire, but it's finicky. A simpler approach might just be to do something like this: Preconditions.checkState(NativeCodeLoader.buildSupportsOpenssl()); Assert.assertTrue(OpenSSLCipher.isNativeCodeLoaded()); We need to make sure that the build fails when there is an issue loading libcrypto.so on Jenkins. Otherwise, we can never be sure about our test coverage. I get your point. And you mean if (NativeCodeLoader.buildSupportsOpenssl()) { Assert.assertTrue(OpensslCipher.isNativeCodeLoaded()); } Right? (We can't make precondition on buildSupportsOpenssl directly, it will fail if build without Openssl) Now I add it in the new patch. Furthermore, in linux the shared library for OpenSSL 1.0.0x and 1.0.1x both are “libcrypto.so.1.0.0”, but AES-CTR is only supported in 1.0.1x. So I add following in the new patch to check the detailed version (we are not able to get whether its 1.0.0x and 1.0.1x from the suffix): if (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR) file(STRINGS "${OPENSSL_INCLUDE_DIR}/openssl/opensslv.h" OPENSSL_VERSION_STR REGEX "^#define[\t ]+OPENSSL_VERSION_NUMBER[\t ]+0x[0-9][0-9][0-9][0-9][0-9][0-9].*" ) string(REGEX MATCH 0x[0-9][0-9][0-9][0-9][0-9][0-9]..L OPENSSL_VERSION "${OPENSSL_VERSION_STR}" ) IF(${OPENSSL_VERSION} LESS 0x1000100fL) SET(OPENSSL_VERSION_MATCHED FALSE) ELSE() SET(OPENSSL_VERSION_MATCHED TRUE) ENDIF() endif (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR) If openssl exists, but the version is not sufficient, OpensslCipher.c will not be compiled. If specifying require.openssl , but the version is not sufficient, build will fail with error message “Openssl version is too low, should be at least 1.0.1”.
        Hide
        Colin Patrick McCabe added a comment -

        Right, I also noticed them, actually [the warnings] were caused by the fix of one previous comment

        I was just asking you to change the type of the variables to unsigned, and then you won't need any casts. It seems that you dropped the cast, but didn't change the type to unsigned.

        If you just apply this patch to v5, you will see what I mean.

        diff --git a/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c b/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c
        index 08cb5e8..39a71c6 100644
        --- a/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c
        +++ b/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c
        @@ -263,8 +263,8 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_update
                 "Output buffer is not sufficient.");
             return 0;
           }
        -  char *input_bytes = (*env)->GetDirectBufferAddress(env, input);
        -  char *output_bytes = (*env)->GetDirectBufferAddress(env, output);
        +  unsigned char *input_bytes = (*env)->GetDirectBufferAddress(env, input);
        +  unsigned char *output_bytes = (*env)->GetDirectBufferAddress(env, output);
           if (input_bytes == NULL || output_bytes == NULL) {
             THROW(env, "java/lang/InternalError", "Cannot get buffer address.");
             return 0;
        @@ -273,8 +273,8 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_update
           output_bytes = output_bytes + output_offset;
           
           int output_len = 0;
        -  if (!dlsym_EVP_CipherUpdate(context, (unsigned char *)output_bytes,  \
        -      &output_len, (unsigned char *)input_bytes, input_len)) {
        +  if (!dlsym_EVP_CipherUpdate(context, output_bytes,  \
        +      &output_len, input_bytes, input_len)) {
             dlsym_EVP_CIPHER_CTX_cleanup(context);
             THROW(env, "java/lang/InternalError", "Error in EVP_CipherUpdate.");
             return 0;
        @@ -308,7 +308,7 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_doFinal
                 "Output buffer is not sufficient.");
             return 0;
           }
        -  char *output_bytes = (*env)->GetDirectBufferAddress(env, output);
        +  unsigned char *output_bytes = (*env)->GetDirectBufferAddress(env, output);
           if (output_bytes == NULL) {
             THROW(env, "java/lang/InternalError", "Cannot get buffer address.");
             return 0;
        @@ -316,8 +316,7 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_doFinal
           output_bytes = output_bytes + offset;
           
           int output_len = 0;
        -  if (!dlsym_EVP_CipherFinal_ex(context, (unsigned char *)output_bytes,  \
        -      &output_len)) {
        +  if (!dlsym_EVP_CipherFinal_ex(context, output_bytes, &output_len)) {
             dlsym_EVP_CIPHER_CTX_cleanup(context);
             THROW(env, "java/lang/InternalError", "Error in EVP_CipherFinal_ex.");
             return 0;
        

        There is no need for casts here.

        Agree, let’s change the name [of the test].

        It's good that you changed the name of the test, but there are still some other class names we should fix. Since the other stuff shows up in config XMLs, it's probably even more important to use something readable.

        Example:

        cmccabe@keter:~/hadoop3> find | grep -i aesctrcrypto
        ./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/TestCryptoStreamsWithOpenSSLAESCTRCryptoCodec.java
        ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/JCEAESCTRCryptoCodec.java
        ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/AESCTRCryptoCodec.java
        ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/OpenSSLAESCTRCryptoCodec.java
        

        Probably want JceAesCtrCryptoCodec, AesCtrCryptoCodec, etc. etc.

        I found it’s a bit inconvenient to do this. 1. -Drequire.openssl is only for build, we can’t get this in hadoop code. 2. If user specify -Drequire.openssl but there is no openssl available, then build will fail.

        The intended behavior of -Drequire.openssl is that it will fail the build if openssl.so is not there. This is by design, since otherwise it is very hard to get reliable builds. By the way, I can see that you got this right in your patch (I just tested it)... specifying -Drequire.openssl makes the build fail on a system without openssl. I added the "require" flags since previously we were having so much trouble ensuring our builds contained what they were supposed to contain.

        There is a way to pass java properties on to Surefire, but it's finicky. A simpler approach might just be to do something like this:

            Preconditions.checkState(NativeCodeLoader.buildSupportsOpenssl());
            Assert.assertTrue(OpenSSLCipher.isNativeCodeLoaded());
        

        We need to make sure that the build fails when there is an issue loading libcrypto.so on Jenkins. Otherwise, we can never be sure about our test coverage.

        Show
        Colin Patrick McCabe added a comment - Right, I also noticed them, actually [the warnings] were caused by the fix of one previous comment I was just asking you to change the type of the variables to unsigned, and then you won't need any casts. It seems that you dropped the cast, but didn't change the type to unsigned. If you just apply this patch to v5, you will see what I mean. diff --git a/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c b/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c index 08cb5e8..39a71c6 100644 --- a/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c +++ b/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c @@ -263,8 +263,8 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_update "Output buffer is not sufficient." ); return 0; } - char *input_bytes = (*env)->GetDirectBufferAddress(env, input); - char *output_bytes = (*env)->GetDirectBufferAddress(env, output); + unsigned char *input_bytes = (*env)->GetDirectBufferAddress(env, input); + unsigned char *output_bytes = (*env)->GetDirectBufferAddress(env, output); if (input_bytes == NULL || output_bytes == NULL) { THROW(env, "java/lang/InternalError" , "Cannot get buffer address." ); return 0; @@ -273,8 +273,8 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_update output_bytes = output_bytes + output_offset; int output_len = 0; - if (!dlsym_EVP_CipherUpdate(context, (unsigned char *)output_bytes, \ - &output_len, (unsigned char *)input_bytes, input_len)) { + if (!dlsym_EVP_CipherUpdate(context, output_bytes, \ + &output_len, input_bytes, input_len)) { dlsym_EVP_CIPHER_CTX_cleanup(context); THROW(env, "java/lang/InternalError" , "Error in EVP_CipherUpdate." ); return 0; @@ -308,7 +308,7 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_doFinal "Output buffer is not sufficient." ); return 0; } - char *output_bytes = (*env)->GetDirectBufferAddress(env, output); + unsigned char *output_bytes = (*env)->GetDirectBufferAddress(env, output); if (output_bytes == NULL) { THROW(env, "java/lang/InternalError" , "Cannot get buffer address." ); return 0; @@ -316,8 +316,7 @@ JNIEXPORT jint JNICALL Java_org_apache_hadoop_crypto_OpenSSLCipher_doFinal output_bytes = output_bytes + offset; int output_len = 0; - if (!dlsym_EVP_CipherFinal_ex(context, (unsigned char *)output_bytes, \ - &output_len)) { + if (!dlsym_EVP_CipherFinal_ex(context, output_bytes, &output_len)) { dlsym_EVP_CIPHER_CTX_cleanup(context); THROW(env, "java/lang/InternalError" , "Error in EVP_CipherFinal_ex." ); return 0; There is no need for casts here. Agree, let’s change the name [of the test]. It's good that you changed the name of the test, but there are still some other class names we should fix. Since the other stuff shows up in config XMLs, it's probably even more important to use something readable. Example: cmccabe@keter:~/hadoop3> find | grep -i aesctrcrypto ./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/TestCryptoStreamsWithOpenSSLAESCTRCryptoCodec.java ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/JCEAESCTRCryptoCodec.java ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/AESCTRCryptoCodec.java ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/OpenSSLAESCTRCryptoCodec.java Probably want JceAesCtrCryptoCodec , AesCtrCryptoCodec , etc. etc. I found it’s a bit inconvenient to do this. 1. -Drequire.openssl is only for build, we can’t get this in hadoop code. 2. If user specify -Drequire.openssl but there is no openssl available, then build will fail. The intended behavior of -Drequire.openssl is that it will fail the build if openssl.so is not there. This is by design, since otherwise it is very hard to get reliable builds. By the way, I can see that you got this right in your patch (I just tested it)... specifying -Drequire.openssl makes the build fail on a system without openssl. I added the "require" flags since previously we were having so much trouble ensuring our builds contained what they were supposed to contain. There is a way to pass java properties on to Surefire, but it's finicky. A simpler approach might just be to do something like this: Preconditions.checkState(NativeCodeLoader.buildSupportsOpenssl()); Assert.assertTrue(OpenSSLCipher.isNativeCodeLoaded()); We need to make sure that the build fails when there is an issue loading libcrypto.so on Jenkins. Otherwise, we can never be sure about our test coverage.
        Hide
        Yi Liu added a comment -

        Colin Patrick McCabe, I updated the patch for your comments.

        I ran TestCryptoCodec and it succeeded whether or not openssl.so was installed. Can you make the test fail when -Drequire.openssl is specified and OpenSSLAESCTRCryptoCodec is not available? You should be able to check System#getProperty or something like that. That way we could be sure our test was covering this on upstream Jenkins.

        I found it’s a bit inconvenient to do this. 1. -Drequire.openssl is only for build, we can’t get this in hadoop code. 2. If user specify -Drequire.openssl but there is no openssl available, then build will fail.
        Thoughts? Maybe existing logic is acceptable?: if openssl native available, then we test it; if not, we ignore it. Just like we do for TestCodec.

        Show
        Yi Liu added a comment - Colin Patrick McCabe , I updated the patch for your comments. I ran TestCryptoCodec and it succeeded whether or not openssl.so was installed. Can you make the test fail when -Drequire.openssl is specified and OpenSSLAESCTRCryptoCodec is not available? You should be able to check System#getProperty or something like that. That way we could be sure our test was covering this on upstream Jenkins. I found it’s a bit inconvenient to do this. 1. -Drequire.openssl is only for build, we can’t get this in hadoop code. 2. If user specify -Drequire.openssl but there is no openssl available, then build will fail. Thoughts? Maybe existing logic is acceptable?: if openssl native available, then we test it; if not, we ignore it. Just like we do for TestCodec .
        Hide
        Yi Liu added a comment -

        Thanks Colin Patrick McCabe for review . I will update the patch later.

        There are a bunch of warnings that this code generates that we should fix:

        Right, I also noticed them, actually it was caused by the fix of one previous comment:

        +  if (!dlsym_EVP_CipherUpdate(context, (unsigned char *)output_bytes,  \
        +      &output_len, (unsigned char *)input_bytes, input_len)) {
        

        These casts could be avoided if you just made the type of output_bytes and input_bytes unsigned.


        OK, I will change it back.

        I ran TestCryptoCodec and it succeeded whether or not openssl.so was installed. Can you make the test fail when -Drequire.openssl is specified and OpenSSLAESCTRCryptoCodec is not available? You should be able to check System#getProperty or something like that. That way we could be sure our test was covering this on upstream Jenkins.

        Right, TestCryptoCodec checks OpenSSLCipher.isNativeCodeLoaded() to decide whether to test OpenSSLAESCTRCryptoCodec, this is the same as in TestCodec
        Your suggestion is good, let’s assert the exception.

        A comment about naming. I find names like testJCEAESCTRCryptoCodec awfully hard to read. Once you get 10 capital letters in a row, it just kind of mashes together. I would prefer something like testJceAesCtrCryptoCodec, where we capitalize only the first letter of each acronym. I think it would make sense to rename some of this stuff in that way... what do you think?

        Agree, let’s change the name.

        Think about what happens if we manage to get jKey, but getting jlv fails. Then we'll throw, but never call ReleaseByteArrayElements on jKey.

        Yes, there is potential issue, let’s check them separately.

        Show
        Yi Liu added a comment - Thanks Colin Patrick McCabe for review . I will update the patch later. There are a bunch of warnings that this code generates that we should fix: Right, I also noticed them, actually it was caused by the fix of one previous comment: + if (!dlsym_EVP_CipherUpdate(context, (unsigned char *)output_bytes, \ + &output_len, (unsigned char *)input_bytes, input_len)) { These casts could be avoided if you just made the type of output_bytes and input_bytes unsigned. OK, I will change it back. I ran TestCryptoCodec and it succeeded whether or not openssl.so was installed. Can you make the test fail when -Drequire.openssl is specified and OpenSSLAESCTRCryptoCodec is not available? You should be able to check System#getProperty or something like that. That way we could be sure our test was covering this on upstream Jenkins. Right, TestCryptoCodec checks OpenSSLCipher.isNativeCodeLoaded() to decide whether to test OpenSSLAESCTRCryptoCodec, this is the same as in TestCodec Your suggestion is good, let’s assert the exception. A comment about naming. I find names like testJCEAESCTRCryptoCodec awfully hard to read. Once you get 10 capital letters in a row, it just kind of mashes together. I would prefer something like testJceAesCtrCryptoCodec , where we capitalize only the first letter of each acronym. I think it would make sense to rename some of this stuff in that way... what do you think? Agree, let’s change the name. Think about what happens if we manage to get jKey , but getting jlv fails. Then we'll throw, but never call ReleaseByteArrayElements on jKey . Yes, there is potential issue, let’s check them separately.
        Hide
        Colin Patrick McCabe added a comment -

        Actually this is already included in the latest patch HADOOP-10693.2.patch.

        Yeah, you're right. I was looking at v1 of the patch when I made that comment.

        I get your point, I will handle the fallback in HADOOP-10735.... I’m thinking we can do it more simpler: If openssl native is not available, in OpenSSLAESCTRCryptoCodec, we use javax.crypto.Cipher instead of org.apache.hadoop.crypto.OpenSSLCipher if fallback is allowed by configuration. JCE is always available, we don’t need a comma list, because fallback codec should be the same algorithm and padding, it’s better not allowed user to configure. Otherwise if user configure several codecs, different codec may be chose in different environments, decryption may fail.

        OK. Let's deal with this in HADOOP-10735, then.

        There are a bunch of warnings that this code generates that we should fix:

        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c: In function ‘Java_org_apache_hadoop_cry
        pto_OpenSSLCipher_update’:
        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: warning: pointer targets in passi
        ng argument 2 of ‘dlsym_EVP_CipherUpdate’ differ in signedness [-Wpointer-sign]
        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: note: expected ‘unsigned char *’ 
        but argument is of type ‘char *’
        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: warning: pointer targets in passi
        ng argument 4 of ‘dlsym_EVP_CipherUpdate’ differ in signedness [-Wpointer-sign]
        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: note: expected ‘const unsigned ch
        ar *’ but argument is of type ‘char *’
        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c: In function ‘Java_org_apache_hadoop_cry
        pto_OpenSSLCipher_doFinal’:
        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c:314:3: warning: pointer targets in passi
        ng argument 2 of ‘dlsym_EVP_CipherFinal_ex’ differ in signedness [-Wpointer-sign]
        /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/crypto/OpenSSLCipher.c:314:3: note: expected ‘unsigned char *’ 
        but argument is of type ‘char *’
        

        Seems like they are related to using char* instead of unsigned char*

        Add test cases TestCryptoCodec#testJCEAESCTRCryptoCodec and TestCryptoCodec#testOpenSSLAESCTRCryptoCodec. Add TestOpenSSLCipher. We originally had TestCryptoStreamsWithOpenSSLAESCTRCryptoCodec

        Thanks.

        I ran TestCryptoCodec and it succeeded whether or not openssl.so was installed. Can you make the test fail when -Drequire.openssl is specified and OpenSSLAESCTRCryptoCodec is not available? You should be able to check System#getProperty or something like that. That way we could be sure our test was covering this on upstream Jenkins.

        A comment about naming. I find names like testJCEAESCTRCryptoCodec awfully hard to read. Once you get 10 capital letters in a row, it just kind of mashes together. I would prefer something like testJceAesCtrCryptoCodec, where we capitalize only the first letter of each acronym. I think it would make sense to rename some of this stuff in that way... what do you think?

          jbyte *jKey = (*env)->GetByteArrayElements(env, key, NULL);                  
          jbyte *jIv = (*env)->GetByteArrayElements(env, iv, NULL);                    
          if (jKey == NULL || jIv == NULL) {                                           
            THROW(env, "java/lang/InternalError", "Cannot get bytes array.");          
            return (jlong)0;                                                           
          }
        

        Think about what happens if we manage to get jKey, but getting jlv fails. Then we'll throw, but never call ReleaseByteArrayElements on jKey.

        Show
        Colin Patrick McCabe added a comment - Actually this is already included in the latest patch HADOOP-10693 .2.patch. Yeah, you're right. I was looking at v1 of the patch when I made that comment. I get your point, I will handle the fallback in HADOOP-10735 .... I’m thinking we can do it more simpler: If openssl native is not available, in OpenSSLAESCTRCryptoCodec, we use javax.crypto.Cipher instead of org.apache.hadoop.crypto.OpenSSLCipher if fallback is allowed by configuration. JCE is always available, we don’t need a comma list, because fallback codec should be the same algorithm and padding, it’s better not allowed user to configure. Otherwise if user configure several codecs, different codec may be chose in different environments, decryption may fail. OK. Let's deal with this in HADOOP-10735 , then. There are a bunch of warnings that this code generates that we should fix: /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c: In function ‘Java_org_apache_hadoop_cry pto_OpenSSLCipher_update’: /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: warning: pointer targets in passi ng argument 2 of ‘dlsym_EVP_CipherUpdate’ differ in signedness [-Wpointer-sign] /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: note: expected ‘unsigned char *’ but argument is of type ‘ char *’ /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: warning: pointer targets in passi ng argument 4 of ‘dlsym_EVP_CipherUpdate’ differ in signedness [-Wpointer-sign] /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c:271:3: note: expected ‘ const unsigned ch ar *’ but argument is of type ‘ char *’ /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c: In function ‘Java_org_apache_hadoop_cry pto_OpenSSLCipher_doFinal’: /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c:314:3: warning: pointer targets in passi ng argument 2 of ‘dlsym_EVP_CipherFinal_ex’ differ in signedness [-Wpointer-sign] /home/cmccabe/hadoop3/hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/crypto/OpenSSLCipher.c:314:3: note: expected ‘unsigned char *’ but argument is of type ‘ char *’ Seems like they are related to using char* instead of unsigned char* Add test cases TestCryptoCodec#testJCEAESCTRCryptoCodec and TestCryptoCodec#testOpenSSLAESCTRCryptoCodec. Add TestOpenSSLCipher. We originally had TestCryptoStreamsWithOpenSSLAESCTRCryptoCodec Thanks. I ran TestCryptoCodec and it succeeded whether or not openssl.so was installed. Can you make the test fail when -Drequire.openssl is specified and OpenSSLAESCTRCryptoCodec is not available? You should be able to check System#getProperty or something like that. That way we could be sure our test was covering this on upstream Jenkins. A comment about naming. I find names like testJCEAESCTRCryptoCodec awfully hard to read. Once you get 10 capital letters in a row, it just kind of mashes together. I would prefer something like testJceAesCtrCryptoCodec , where we capitalize only the first letter of each acronym. I think it would make sense to rename some of this stuff in that way... what do you think? jbyte *jKey = (*env)->GetByteArrayElements(env, key, NULL); jbyte *jIv = (*env)->GetByteArrayElements(env, iv, NULL); if (jKey == NULL || jIv == NULL) { THROW(env, "java/lang/InternalError" , "Cannot get bytes array." ); return (jlong)0; } Think about what happens if we manage to get jKey , but getting jlv fails. Then we'll throw, but never call ReleaseByteArrayElements on jKey .
        Hide
        Yi Liu added a comment -

        Colin Patrick McCabe , I update the patch for your comments.
        Add test cases TestCryptoCodec#testJCEAESCTRCryptoCodec and TestCryptoCodec#testOpenSSLAESCTRCryptoCodec
        Add TestOpenSSLCipher
        We originally had TestCryptoStreamsWithOpenSSLAESCTRCryptoCodec

        Tucu is right here... rather than configuring a single codec class, why don't we make this configuration contain a comma-separated list of codecs to try? That way it would be pretty simple to configure the system to use openssl with fallback to JCE, etc. We would simply keep trying entries in the list until one was available.

        I get your point, I will handle the fallback in HADOOP-10735.
        I’m thinking we can do it more simpler: If openssl native is not available, in OpenSSLAESCTRCryptoCodec, we use javax.crypto.Cipher instead of org.apache.hadoop.crypto.OpenSSLCipher if fallback is allowed by configuration. JCE is always available, we don’t need a comma list, because fallback codec should be the same algorithm and padding, it’s better not allowed user to configure. Otherwise if user configure several codecs, different codec may be chose in different environments, decryption may fail.

        These casts could be avoided if you just made the type of output_bytes and input_bytes unsigned.

        Right, I update it.

        We need to do error checking here in case the GetByteArrayElements calls fail.

        I update it.

        Rather than having stuff like this, let's just change the CMakeLists.txt file to not include this file unless openssl is found. #ifdefing out a whole file is just weird, and has led to bugs in the past.

        I update it. Actually we just need to remove the if defined, we had that logic in CMakeList.txt.

        Show
        Yi Liu added a comment - Colin Patrick McCabe , I update the patch for your comments. Add test cases TestCryptoCodec#testJCEAESCTRCryptoCodec and TestCryptoCodec#testOpenSSLAESCTRCryptoCodec Add TestOpenSSLCipher We originally had TestCryptoStreamsWithOpenSSLAESCTRCryptoCodec Tucu is right here... rather than configuring a single codec class, why don't we make this configuration contain a comma-separated list of codecs to try? That way it would be pretty simple to configure the system to use openssl with fallback to JCE, etc. We would simply keep trying entries in the list until one was available. I get your point, I will handle the fallback in HADOOP-10735 . I’m thinking we can do it more simpler: If openssl native is not available, in OpenSSLAESCTRCryptoCodec , we use javax.crypto.Cipher instead of org.apache.hadoop.crypto.OpenSSLCipher if fallback is allowed by configuration. JCE is always available, we don’t need a comma list, because fallback codec should be the same algorithm and padding, it’s better not allowed user to configure. Otherwise if user configure several codecs, different codec may be chose in different environments, decryption may fail. These casts could be avoided if you just made the type of output_bytes and input_bytes unsigned. Right, I update it. We need to do error checking here in case the GetByteArrayElements calls fail. I update it. Rather than having stuff like this, let's just change the CMakeLists.txt file to not include this file unless openssl is found. #ifdefing out a whole file is just weird, and has led to bugs in the past. I update it. Actually we just need to remove the if defined , we had that logic in CMakeList.txt.
        Hide
        Yi Liu added a comment -

        Thanks Colin Patrick McCabe for the review and good comments.

        Are you familiar with checknative? It prints out a bunch of information about the native libraries which are available. For example, this is what it prints for me:
        It would be great to include openssl.so in here as well.

        Actually this is already included in the latest patch HADOOP-10693.2.patch.

        What's the best way to test this JNI code? Perhaps running TestCryptoCodec with the correct configuration? Perhaps we ought to have a subclass of TestCryptoCodec that sets this configuration and then runs the parent class. If we don't have any unit test coverage on Jenkins, then I am afraid this might bitrot.

        Actually in the patches, we have test cases TestCryptoStreamsWithOpenSSLCipher to cover crypto functionality with correct configuration. It includes lots of tests. I will add more test cases for OpenSSLAESCTRCryptoCodec.

        public class TestCryptoStreamsWithOpenSSLCipher extends TestCryptoStreams {
        

        For other comments, I will update in next patch.

        Show
        Yi Liu added a comment - Thanks Colin Patrick McCabe for the review and good comments. Are you familiar with checknative ? It prints out a bunch of information about the native libraries which are available. For example, this is what it prints for me: It would be great to include openssl.so in here as well. Actually this is already included in the latest patch HADOOP-10693 .2.patch. What's the best way to test this JNI code? Perhaps running TestCryptoCodec with the correct configuration? Perhaps we ought to have a subclass of TestCryptoCodec that sets this configuration and then runs the parent class. If we don't have any unit test coverage on Jenkins, then I am afraid this might bitrot. Actually in the patches, we have test cases TestCryptoStreamsWithOpenSSLCipher to cover crypto functionality with correct configuration. It includes lots of tests. I will add more test cases for OpenSSLAESCTRCryptoCodec . public class TestCryptoStreamsWithOpenSSLCipher extends TestCryptoStreams { For other comments, I will update in next patch.
        Hide
        Colin Patrick McCabe added a comment -

        Alejandro Abdelnur, Andrew Purtell: Yeah, you are right. It is nice to use the CPU-optimized implementation in openssl, and it does avoid some of the export issues.

        +    conf.set(HADOOP_SECURITY_CRYPTO_CODEC_CLASS_KEY, 
        +        OpenSSLAESCTRCryptoCodec.class.getName());
        

        Tucu is right here... rather than configuring a single codec class, why don't we make this configuration contain a comma-separated list of codecs to try? That way it would be pretty simple to configure the system to use openssl with fallback to JCE, etc. We would simply keep trying entries in the list until one was available.

        +  if (!dlsym_EVP_CipherUpdate(context, (unsigned char *)output_bytes,  \
        +      &output_len, (unsigned char *)input_bytes, input_len)) {
        

        These casts could be avoided if you just made the type of output_bytes and input_bytes unsigned.

        +  jbyte *jKey = (*env)->GetByteArrayElements(env, key, NULL);
        +  jbyte *jIv = (*env)->GetByteArrayElements(env, iv, NULL);
        +  
        +  dlsym_EVP_CipherInit_ex(context, getEvpCipher(alg, jKeyLen), NULL,  \ 
        +      (unsigned char *)jKey, (unsigned char *)jIv, mode == ENCRYPT_MODE);
        +  
        +  (*env)->ReleaseByteArrayElements(env, key, jKey, 0);
        +  (*env)->ReleaseByteArrayElements(env, iv, jIv, 0);
        

        We need to do error checking here in case the GetByteArrayElements calls fail.

        + 
        +#if defined HADOOP_OPENSSL_LIBRARY
        +
        

        Rather than having stuff like this, let's just change the CMakeLists.txt file to not include this file unless openssl is found. #ifdefing out a whole file is just weird, and has led to bugs in the past.

        Are you familiar with checknative? It prints out a bunch of information about the native libraries which are available. For example, this is what it prints for me:

        $ hadoop checknative
        Native library checking:
        hadoop: true /h/lib/native/libhadoop.so.1.0.0
        zlib:   true /lib64/libz.so.1
        snappy: true /usr/local/lib64/libsnappy.so.1
        lz4:    true revision:99
        bzip2:  true /usr/lib64/libbz2.so.1
        

        It would be great to include openssl.so in here as well.

        What's the best way to test this JNI code? Perhaps running TestCryptoCodec with the correct configuration? Perhaps we ought to have a subclass of TestCryptoCodec that sets this configuration and then runs the parent class. If we don't have any unit test coverage on Jenkins, then I am afraid this might bitrot.

        Very encouraging progress so far, looking forward to seeing this get in.

        Show
        Colin Patrick McCabe added a comment - Alejandro Abdelnur , Andrew Purtell : Yeah, you are right. It is nice to use the CPU-optimized implementation in openssl, and it does avoid some of the export issues. + conf.set(HADOOP_SECURITY_CRYPTO_CODEC_CLASS_KEY, + OpenSSLAESCTRCryptoCodec.class.getName()); Tucu is right here... rather than configuring a single codec class, why don't we make this configuration contain a comma-separated list of codecs to try? That way it would be pretty simple to configure the system to use openssl with fallback to JCE, etc. We would simply keep trying entries in the list until one was available. + if (!dlsym_EVP_CipherUpdate(context, (unsigned char *)output_bytes, \ + &output_len, (unsigned char *)input_bytes, input_len)) { These casts could be avoided if you just made the type of output_bytes and input_bytes unsigned. + jbyte *jKey = (*env)->GetByteArrayElements(env, key, NULL); + jbyte *jIv = (*env)->GetByteArrayElements(env, iv, NULL); + + dlsym_EVP_CipherInit_ex(context, getEvpCipher(alg, jKeyLen), NULL, \ + (unsigned char *)jKey, (unsigned char *)jIv, mode == ENCRYPT_MODE); + + (*env)->ReleaseByteArrayElements(env, key, jKey, 0); + (*env)->ReleaseByteArrayElements(env, iv, jIv, 0); We need to do error checking here in case the GetByteArrayElements calls fail. + +# if defined HADOOP_OPENSSL_LIBRARY + Rather than having stuff like this, let's just change the CMakeLists.txt file to not include this file unless openssl is found. #ifdefing out a whole file is just weird, and has led to bugs in the past. Are you familiar with checknative ? It prints out a bunch of information about the native libraries which are available. For example, this is what it prints for me: $ hadoop checknative Native library checking: hadoop: true /h/lib/ native /libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: true /usr/local/lib64/libsnappy.so.1 lz4: true revision:99 bzip2: true /usr/lib64/libbz2.so.1 It would be great to include openssl.so in here as well. What's the best way to test this JNI code? Perhaps running TestCryptoCodec with the correct configuration? Perhaps we ought to have a subclass of TestCryptoCodec that sets this configuration and then runs the parent class. If we don't have any unit test coverage on Jenkins, then I am afraid this might bitrot. Very encouraging progress so far, looking forward to seeing this get in.
        Hide
        Yi Liu added a comment -

        Thanks Alejandro Abdelnur and Andrew Purtell for the comments.

        In the OpenSSLAESCTRCryptoCodec#process(), shouldn't we check as precondition that the byte buffers are direct byte buffers (to avoid obscure error messages coming back from JNI/OpenSSL)?

        You are right, I add the precondition in OpenSSLCipher#update() and OpenSSLCipher#doFinal(), since there invokes JNI.

        Are you planning to fallback to Java impl if JNI/OpenSSL is not avail? If so, it should be possible to disable the fallback via configuration to avoid accidental fallback because of misconfiguration and user not noticing that. Else a BIG warning in the logs that the fallback is happening.

        Right. I will add the configuration and also a BIG warning when handling fallback in HADOOP-10735.

        Show
        Yi Liu added a comment - Thanks Alejandro Abdelnur and Andrew Purtell for the comments. In the OpenSSLAESCTRCryptoCodec#process(), shouldn't we check as precondition that the byte buffers are direct byte buffers (to avoid obscure error messages coming back from JNI/OpenSSL)? You are right, I add the precondition in OpenSSLCipher#update() and OpenSSLCipher#doFinal(), since there invokes JNI. Are you planning to fallback to Java impl if JNI/OpenSSL is not avail? If so, it should be possible to disable the fallback via configuration to avoid accidental fallback because of misconfiguration and user not noticing that. Else a BIG warning in the logs that the fallback is happening. Right. I will add the configuration and also a BIG warning when handling fallback in HADOOP-10735 .
        Hide
        Yi Liu added a comment -

        Thanks Colin Patrick McCabe for your nice review. The new patch includes update for your comments.

        It's not really "crypto" we're locating with this, but the openssl library. So the options should be named -Dopenssl.prefix, etc. Similar with a lot of the constants and other code options.

        Right, I update it.

        General formatting: we have an 80-column limit in Hadoop.

        I update it. My bad, I thought we didn’t need this limit in JNI.

        Formatting: should be char *input_bytes. Also, you need to check these against null, since they may not succeed on some JVMs or if a non-direct buffer was passed.

        I update it.

        You can just write dlsym_EVP_CipherUpdate(...). Same with all the other calls to function pointers.

        I update it.

        Show
        Yi Liu added a comment - Thanks Colin Patrick McCabe for your nice review. The new patch includes update for your comments. It's not really "crypto" we're locating with this, but the openssl library. So the options should be named -Dopenssl.prefix, etc. Similar with a lot of the constants and other code options. Right, I update it. General formatting: we have an 80-column limit in Hadoop. I update it. My bad, I thought we didn’t need this limit in JNI. Formatting: should be char *input_bytes. Also, you need to check these against null, since they may not succeed on some JVMs or if a non-direct buffer was passed. I update it. You can just write dlsym_EVP_CipherUpdate(...) . Same with all the other calls to function pointers. I update it.
        Hide
        Andrew Purtell added a comment -

        Is it possible to use something like this? http://www.literatecode.com/aes256 This implementation is just two files, a .c and a .h, and doesn't require any libraries to be installed.

        What is really of value in the OpenSSL crypto implementation are the highly optimized (assembler) implementations of the AES algorithm. I took a quick look at the provided URL and the author himself says:

        It is a straightforward and rather naïve byte-oriented portable C implementation, where all the lookup tables replaced with on-the-fly calculations. Certainly it is slower and more subjective to side-channel attacks by nature.

        If there are concerns about OpenSSL, there is always NSS. Using NSS instead won't isolate the project from potential security advisories involving the crypto library, you've just chosen a different library with a different pedigree, and perhaps not quite the same level of scrutiny (after heartbleed).

        Show
        Andrew Purtell added a comment - Is it possible to use something like this? http://www.literatecode.com/aes256 This implementation is just two files, a .c and a .h, and doesn't require any libraries to be installed. What is really of value in the OpenSSL crypto implementation are the highly optimized (assembler) implementations of the AES algorithm. I took a quick look at the provided URL and the author himself says: It is a straightforward and rather naïve byte-oriented portable C implementation, where all the lookup tables replaced with on-the-fly calculations. Certainly it is slower and more subjective to side-channel attacks by nature. If there are concerns about OpenSSL, there is always NSS. Using NSS instead won't isolate the project from potential security advisories involving the crypto library, you've just chosen a different library with a different pedigree, and perhaps not quite the same level of scrutiny (after heartbleed).
        Hide
        Alejandro Abdelnur added a comment -

        Yi Liu, a couple of comments on the Java side of things:

        In the OpenSSLAESCTRCryptoCodec#process(), shouldn't we check as precondition that the byte buffers are direct byte buffers (to avoid obscure error messages coming back from JNI/OpenSSL)?

        Are you planning to fallback to Java impl if JNI/OpenSSL is not avail? If so, it should be possible to disable the fallback via configuration to avoid accidental fallback because of misconfiguration and user not noticing that. Else a BIG warning in the logs that the fallback is happening.

        Show
        Alejandro Abdelnur added a comment - Yi Liu , a couple of comments on the Java side of things: In the OpenSSLAESCTRCryptoCodec#process(), shouldn't we check as precondition that the byte buffers are direct byte buffers (to avoid obscure error messages coming back from JNI/OpenSSL)? Are you planning to fallback to Java impl if JNI/OpenSSL is not avail? If so, it should be possible to disable the fallback via configuration to avoid accidental fallback because of misconfiguration and user not noticing that. Else a BIG warning in the logs that the fallback is happening.
        Hide
        Alejandro Abdelnur added a comment -

        Colin Patrick McCabe, regarding your suggestion on using the aes256 library you pointed out. The issues I see with that are:

        • Hadoop should not be in the business of writing/shipping/supporting crypto implementations but simply using them.
        • If we ship cryptographic code Apache Hadoop needs to seek US export approval. By using OpenSSL there is no need for.
        • OpenSSL is being kept up to date to the latest AES NI instruction sets and Linux distros are picking up those changes quite fast. By using the aes256 we have to do that.
        • One thing is to say you are fully compatible with FIPS, the other is to be FIPS certified. OpenSSL versions get FIPS certification, I doubt it is the case for aes256.
        Show
        Alejandro Abdelnur added a comment - Colin Patrick McCabe , regarding your suggestion on using the aes256 library you pointed out. The issues I see with that are: Hadoop should not be in the business of writing/shipping/supporting crypto implementations but simply using them. If we ship cryptographic code Apache Hadoop needs to seek US export approval. By using OpenSSL there is no need for. OpenSSL is being kept up to date to the latest AES NI instruction sets and Linux distros are picking up those changes quite fast. By using the aes256 we have to do that. One thing is to say you are fully compatible with FIPS, the other is to be FIPS certified. OpenSSL versions get FIPS certification, I doubt it is the case for aes256.
        Hide
        Colin Patrick McCabe added a comment -

        Thanks for looking at this.

        I find myself wishing we could avoid using OpenSSL here. We've been relatively insulated from all the security advisories on that library up until this point. Although I understand that the code path we're using here is just AES, and is not affected by Heartbleed or anything like that, it will mean more headaches for us if we have to monitor the OpenSSL vulnerability scene. Especially for users who choose to bundle the library.

        Is it possible to use something like this? http://www.literatecode.com/aes256 This implementation is just two files, a .c and a .h, and doesn't require any libraries to be installed. (I am just kicking around ideas here... maybe it isn't practical.) But it sure would be way, way easier...

        Support following options for build:
        \-Dcrypto.prefix=
        \-Dcrypto.lib=
        \-Dcrypto.include=
        \-Drequire.crypto
        \-Dbundle.crypto
        

        It's not really "crypto" we're locating with this, but the openssl library. So the options should be named -Dopenssl.prefix, etc. Similar with a lot of the constants and other code options.

        General formatting: we have an 80-column limit in Hadoop.

        +  char * input_bytes = (*env)->GetDirectBufferAddress(env, input);
        +  char * output_bytes = (*env)->GetDirectBufferAddress(env, output);
        +  input_bytes = input_bytes + input_offset;
        +  output_bytes = output_bytes + output_offset;
        

        Formatting: should be char *input_bytes. Also, you need to check these against null, since they may not succeed on some JVMs or if a non-direct buffer was passed.

        +  if (!(*dlsym_EVP_CipherUpdate)(context, (unsigned char *)output_bytes, &output_len, (unsigned char *)input_bytes, input_len)) {
        

        You can just write dlsym_EVP_CipherUpdate(...). Same with all the other calls to function pointers.

        Show
        Colin Patrick McCabe added a comment - Thanks for looking at this. I find myself wishing we could avoid using OpenSSL here. We've been relatively insulated from all the security advisories on that library up until this point. Although I understand that the code path we're using here is just AES, and is not affected by Heartbleed or anything like that, it will mean more headaches for us if we have to monitor the OpenSSL vulnerability scene. Especially for users who choose to bundle the library. Is it possible to use something like this? http://www.literatecode.com/aes256 This implementation is just two files, a .c and a .h, and doesn't require any libraries to be installed. (I am just kicking around ideas here... maybe it isn't practical.) But it sure would be way, way easier... Support following options for build: \-Dcrypto.prefix= \-Dcrypto.lib= \-Dcrypto.include= \-Drequire.crypto \-Dbundle.crypto It's not really "crypto" we're locating with this, but the openssl library. So the options should be named -Dopenssl.prefix , etc. Similar with a lot of the constants and other code options. General formatting: we have an 80-column limit in Hadoop. + char * input_bytes = (*env)->GetDirectBufferAddress(env, input); + char * output_bytes = (*env)->GetDirectBufferAddress(env, output); + input_bytes = input_bytes + input_offset; + output_bytes = output_bytes + output_offset; Formatting: should be char *input_bytes . Also, you need to check these against null, since they may not succeed on some JVMs or if a non-direct buffer was passed. + if (!(*dlsym_EVP_CipherUpdate)(context, (unsigned char *)output_bytes, &output_len, (unsigned char *)input_bytes, input_len)) { You can just write dlsym_EVP_CipherUpdate(...) . Same with all the other calls to function pointers.
        Hide
        Yi Liu added a comment -

        Rebase the patch on latest branch.
        If using custom OpenSSL path, please configure LD_LIBRARY_PATH

        Show
        Yi Liu added a comment - Rebase the patch on latest branch. If using custom OpenSSL path, please configure LD_LIBRARY_PATH
        Hide
        Yi Liu added a comment -

        The patch includes an OpenSSLCipher and implementation of AES-CTR CryptoCodec using OpenSSLCipher. The implementation requires shared library of libcrypto.

        Recommend to use the latest OpenSSL version.
        Support following options for build:

        • -Dcrypto.prefix=
        • -Dcrypto.lib=
        • -Dcrypto.include=
        • -Drequire.crypto
        • -Dbundle.crypto

        If specifying custom path, the shared library in custom path will be used instead of in system path.

        Show
        Yi Liu added a comment - The patch includes an OpenSSLCipher and implementation of AES-CTR CryptoCodec using OpenSSLCipher . The implementation requires shared library of libcrypto. Recommend to use the latest OpenSSL version. Support following options for build: -Dcrypto.prefix= -Dcrypto.lib= -Dcrypto.include= -Drequire.crypto -Dbundle.crypto If specifying custom path, the shared library in custom path will be used instead of in system path.

          People

          • Assignee:
            Yi Liu
            Reporter:
            Yi Liu
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development