Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5890

Support for encrypting Intermediate data and spills in local filesystem

    Details

    • Hadoop Flags:
      Reviewed

      Description

      For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest.

      1. MAPREDUCE-5890.15.patch
        58 kB
        Arun Suresh
      2. MAPREDUCE-5890.14.patch
        58 kB
        Arun Suresh
      3. MAPREDUCE-5890.13.patch
        58 kB
        Arun Suresh
      4. MAPREDUCE-5890.12.patch
        58 kB
        Arun Suresh
      5. MAPREDUCE-5890.11.patch
        58 kB
        Arun Suresh
      6. MAPREDUCE-5890.10.patch
        59 kB
        Arun Suresh
      7. MAPREDUCE-5890.9.patch
        74 kB
        Arun Suresh
      8. MAPREDUCE-5890.8.patch
        158 kB
        Arun Suresh
      9. MAPREDUCE-5890.7.patch
        158 kB
        Arun Suresh
      10. MAPREDUCE-5890.6.patch
        71 kB
        Arun Suresh
      11. MAPREDUCE-5890.5.patch
        72 kB
        Arun Suresh
      12. MAPREDUCE-5890.4.patch
        68 kB
        Arun Suresh
      13. org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt
        586 kB
        Arun Suresh
      14. syslog.tar.gz
        463 kB
        Arun Suresh
      15. MAPREDUCE-5890.3.patch
        66 kB
        Arun Suresh

        Issue Links

          Activity

          Hide
          Alejandro Abdelnur added a comment -

          HADOOP-10603 introduces crypto streams to be used by for filesystem encryption. We could leverage it for encrypting map output data, the Reducer shuffle would decrypt it (no need for network encryption as data would be encrypted in transit). The reducer, when writing spills to disk woudl encrypt and it would decrypt while reading the spills.

          It may make sense to do this JIRA as part of fs-encryption branch.

          Show
          Alejandro Abdelnur added a comment - HADOOP-10603 introduces crypto streams to be used by for filesystem encryption. We could leverage it for encrypting map output data, the Reducer shuffle would decrypt it (no need for network encryption as data would be encrypted in transit). The reducer, when writing spills to disk woudl encrypt and it would decrypt while reading the spills. It may make sense to do this JIRA as part of fs-encryption branch.
          Hide
          Arun Suresh added a comment -

          Attaching initial patch that encrypts intermediate MapReduce spill files.
          NOTE : This is to be applied to the 'fs-encyption' branch

          The following are locations in the code path modified by the patch :

          1) In the Map phase, when any on-disk file is created : When the Merger writes segments to disk as well as when spill files are written to disk in the MapTask, an IV (Initialization Vector) is initialized for the file and written to the same directory (like the index file) with an appropriate suffix. No encryption happens for in-memory data (when segments are sorted in-memory). At the end of the Map phase, each Map task will have written a single spill file to disk, which is encrypted and an associated IV file will also be present in the directory.

          2) The Shuffle Handler : When request for partition comes in from the Fetcher, The ShuffleHandler checks to see if the spill file for the map attempt is encrypted (which would be true if it finds an associated 'crypto-iv' suffixed file). It then adds the IV into the ShuffleHeader for that spillfile and sends the encrypted stream as is to the Fetcher.

          3) In the Reduce Phase : The fetcher receives the ShuffleHeader for the HTTP stream and if it finds the IV, will use the IV to wrap the InputStream with a CryptoInputStream and pass it on to the Reduce stage Mergers. Before the Merger writes to disk (Either the OnDiskMerger or the InMemoryMerger)

          Other Notes :

          • There is no need for over the network encryption of shuffle data as it is already encrypted.
          • The Encryption keys are set into the TokenCache in the JobSubmitter.
          Show
          Arun Suresh added a comment - Attaching initial patch that encrypts intermediate MapReduce spill files. NOTE : This is to be applied to the 'fs-encyption' branch The following are locations in the code path modified by the patch : 1) In the Map phase, when any on-disk file is created : When the Merger writes segments to disk as well as when spill files are written to disk in the MapTask, an IV (Initialization Vector) is initialized for the file and written to the same directory (like the index file) with an appropriate suffix. No encryption happens for in-memory data (when segments are sorted in-memory). At the end of the Map phase, each Map task will have written a single spill file to disk, which is encrypted and an associated IV file will also be present in the directory. 2) The Shuffle Handler : When request for partition comes in from the Fetcher, The ShuffleHandler checks to see if the spill file for the map attempt is encrypted (which would be true if it finds an associated 'crypto-iv' suffixed file). It then adds the IV into the ShuffleHeader for that spillfile and sends the encrypted stream as is to the Fetcher. 3) In the Reduce Phase : The fetcher receives the ShuffleHeader for the HTTP stream and if it finds the IV, will use the IV to wrap the InputStream with a CryptoInputStream and pass it on to the Reduce stage Mergers. Before the Merger writes to disk (Either the OnDiskMerger or the InMemoryMerger) Other Notes : There is no need for over the network encryption of shuffle data as it is already encrypted. The Encryption keys are set into the TokenCache in the JobSubmitter.
          Hide
          Arun Suresh added a comment -

          Adding an end-to-end TestCase

          Show
          Arun Suresh added a comment - Adding an end-to-end TestCase
          Hide
          Alejandro Abdelnur added a comment -

          Suresh, any special reason why the test is not included in the main patch?

          I’m not quite happy with the IF blocks scattered around:

                if (CryptoUtils.isShuffleEncrypted(conf)) {
                  byte[] iv = CryptoUtils.createIVFile(conf, fs, file);
                  out = CryptoUtils.wrap(conf, iv, out);
                }
          

          Given that current abstraction does not provide a clean cut to hide this within the IFile without a significant refactoring throughout the code, I think is the least evil.

          Nice job.

          Could you try running test-patch locally on the fs-encryption branch with this patch?

          Show
          Alejandro Abdelnur added a comment - Suresh, any special reason why the test is not included in the main patch? I’m not quite happy with the IF blocks scattered around: if (CryptoUtils.isShuffleEncrypted(conf)) { byte [] iv = CryptoUtils.createIVFile(conf, fs, file); out = CryptoUtils.wrap(conf, iv, out); } Given that current abstraction does not provide a clean cut to hide this within the IFile without a significant refactoring throughout the code, I think is the least evil. Nice job. Could you try running test-patch locally on the fs-encryption branch with this patch?
          Hide
          Arun Suresh added a comment -

          Updating Patch combined with Test Case

          Show
          Arun Suresh added a comment - Updating Patch combined with Test Case
          Hide
          Arun Suresh added a comment -

          Alejandro Abdelnur,

          Yup, I did not find any better way of doing this..
          I have combined the code and test case into a single patch. I have also run the test case locally varying the number of Maps and Reduces.. It works fine..

          Show
          Arun Suresh added a comment - Alejandro Abdelnur , Yup, I did not find any better way of doing this.. I have combined the code and test case into a single patch. I have also run the test case locally varying the number of Maps and Reduces.. It works fine..
          Hide
          Alejandro Abdelnur added a comment -

          LGTM. Arun Suresh, can you run test-patch locally on the patch and paste the result in the JIRA? After that, I think we are good to go.

          Show
          Alejandro Abdelnur added a comment - LGTM. Arun Suresh , can you run test-patch locally on the patch and paste the result in the JIRA? After that, I think we are good to go.
          Hide
          Arun Suresh added a comment -

          Attaching output of TestCase and tar of the syslog files generated by the Containers..

          Screen dump :

          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          
          -------------------------------------------------------
           T E S T S
          -------------------------------------------------------
          Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 79.342 sec - in org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
          
          Results :
          
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
          
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 01:28 min
          [INFO] Finished at: 2014-06-20T20:49:58-08:00
          [INFO] Final Memory: 39M/710M
          [INFO] ------------------------------------------------------------------------
          
          Show
          Arun Suresh added a comment - Attaching output of TestCase and tar of the syslog files generated by the Containers.. Screen dump : ------------------------------------------------------- T E S T S ------------------------------------------------------- ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 79.342 sec - in org.apache.hadoop.mapred.TestMRIntermediateDataEncryption Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:28 min [INFO] Finished at: 2014-06-20T20:49:58-08:00 [INFO] Final Memory: 39M/710M [INFO] ------------------------------------------------------------------------
          Hide
          Chris Douglas added a comment -

          Given that current abstraction does not provide a clean cut to hide this within the IFile without a significant refactoring throughout the code, I think is the least evil.

          It's expedient, but this code is already difficult to follow. Arun, would you mind making an attempt at refactoring? The current code doesn't have an existing abstraction for this, but writing a separate file for every spill just to store a few bytes of IV doesn't seem like a reasonable tradeoff in either performance or complexity. Adding a metadata block to the IFile segment or adding the IV to the spill index (to be added in the header, as in the current patch) would both work.

          A couple nits:

          • In OnDiskMapOutput, the disk field can stay final, since the only assignment is in the cstr
          • Minor indentation/braces issue in MapTask:
            noformat
            + if (CryptoUtils.isShuffleEncrypted(job))
            + CryptoUtils.deleteIVFile(rfs, filename[i]);
            noformat

          Minor nit: please leave old patches attached to avoid orphaning the discussion around them.

          Show
          Chris Douglas added a comment - Given that current abstraction does not provide a clean cut to hide this within the IFile without a significant refactoring throughout the code, I think is the least evil. It's expedient, but this code is already difficult to follow. Arun, would you mind making an attempt at refactoring? The current code doesn't have an existing abstraction for this, but writing a separate file for every spill just to store a few bytes of IV doesn't seem like a reasonable tradeoff in either performance or complexity. Adding a metadata block to the IFile segment or adding the IV to the spill index (to be added in the header, as in the current patch) would both work. A couple nits: In OnDiskMapOutput , the disk field can stay final, since the only assignment is in the cstr Minor indentation/braces issue in MapTask : noformat + if (CryptoUtils.isShuffleEncrypted(job)) + CryptoUtils.deleteIVFile(rfs, filename [i] ); noformat Minor nit: please leave old patches attached to avoid orphaning the discussion around them.
          Hide
          Arun Suresh added a comment -

          Chris Douglas,
          Updating patch..
          As per your suggestion, I've refactored out the need for another file to store the IV.

          But I still could not find a consistent way for seamlessly handling the IV :

          • The IV for the Spill file created at the end of the Map phase is now added to the spill IndexRecord and is transmitted to the Fetcher via the ShuffleHeader in the ShuffleHandler
          • Unfortunately, the intermediate files created by the OnDisk mergers do not have an index file associated with them. I was thus forced to write the IV as a prefix into the stream. This happens when I "wrap" the outputStream, before passing it to the IFile writer.
          Show
          Arun Suresh added a comment - Chris Douglas , Updating patch.. As per your suggestion, I've refactored out the need for another file to store the IV. But I still could not find a consistent way for seamlessly handling the IV : The IV for the Spill file created at the end of the Map phase is now added to the spill IndexRecord and is transmitted to the Fetcher via the ShuffleHeader in the ShuffleHandler Unfortunately, the intermediate files created by the OnDisk mergers do not have an index file associated with them. I was thus forced to write the IV as a prefix into the stream. This happens when I "wrap" the outputStream, before passing it to the IFile writer.
          Hide
          Chris Douglas added a comment -

          Thanks for updating the patch, Arun. Adding seeks for serving map output would be regrettable.

          Few nits:

          • unused, private static field counter added to Fetcher
          • unit test should use JUnit4 annotations rather than extending TestCase
          • +      InputStream is = input;
            +      is = CryptoUtils.wrap(jobConf, iv, is, offset, compressedLength);
            

            is equivalently InputStream is = CryptoUtils.wrap(jobConf, iv, input, offset, compressedLength);

          • While not terribly expensive, there are a lot of redundant lookups for the encrypted shuffle config parameter.
          • There are many counterexamples, but running a MR job is a heavy way to test this.
          • To be sure I understand the IV logic, it's injected in the stream as a prefix to the segment during a merge, but is part of the index record during a spill. Is that accurate? Adding a few comments calling this out would be appreciated, particularly since it's hard to spot in the merge.
          • Has this been tested on spills with intermediate merges? With more than a single reduce? Looking at the patch, it looks like it creates the stream with the IV, it doesn't reset the IV for each segment (apologies, I haven't tried applying it, so I might just be misreading the context).
          • Since the IV size is hard-coded in CryptoUtils to 16 bytes (and part of the IndexRecord format), it should probably fail if the CryptoCodec::getAlgorithmBlockSize returns anything else.

          Much of the logic in here is internal to MapReduce, so it would be unfair to ask that this create better abstractions than what exists, but the IV handling is pretty ad hoc. Other improvements under consideration- particularly native implementations and other frameworks building on the ShuffleHandler- may rely on this code, as well as older versions of MapReduce that will fail without deploying two versions of the ShuffleHandler.

          To make it backwards compatible, the IV can be part of each IFile segment (requiring no changes to ShuffleHandler or the SpillRecord/IndexRecord format), or the IVs can be added to the end of the SpillRecord. In the latter case, the Fetcher will need to request that the alternate interpretation by including a header; old versions will get the existing interpretation of the SpillRecord.

          Show
          Chris Douglas added a comment - Thanks for updating the patch, Arun. Adding seeks for serving map output would be regrettable. Few nits: unused, private static field counter added to Fetcher unit test should use JUnit4 annotations rather than extending TestCase + InputStream is = input; + is = CryptoUtils.wrap(jobConf, iv, is, offset, compressedLength); is equivalently InputStream is = CryptoUtils.wrap(jobConf, iv, input, offset, compressedLength); While not terribly expensive, there are a lot of redundant lookups for the encrypted shuffle config parameter. There are many counterexamples, but running a MR job is a heavy way to test this. To be sure I understand the IV logic, it's injected in the stream as a prefix to the segment during a merge, but is part of the index record during a spill. Is that accurate? Adding a few comments calling this out would be appreciated, particularly since it's hard to spot in the merge. Has this been tested on spills with intermediate merges? With more than a single reduce? Looking at the patch, it looks like it creates the stream with the IV, it doesn't reset the IV for each segment (apologies, I haven't tried applying it, so I might just be misreading the context). Since the IV size is hard-coded in CryptoUtils to 16 bytes (and part of the IndexRecord format), it should probably fail if the CryptoCodec::getAlgorithmBlockSize returns anything else. Much of the logic in here is internal to MapReduce, so it would be unfair to ask that this create better abstractions than what exists, but the IV handling is pretty ad hoc. Other improvements under consideration- particularly native implementations and other frameworks building on the ShuffleHandler - may rely on this code, as well as older versions of MapReduce that will fail without deploying two versions of the ShuffleHandler. To make it backwards compatible, the IV can be part of each IFile segment (requiring no changes to ShuffleHandler or the SpillRecord / IndexRecord format), or the IVs can be added to the end of the SpillRecord . In the latter case, the Fetcher will need to request that the alternate interpretation by including a header; old versions will get the existing interpretation of the SpillRecord .
          Hide
          Alejandro Abdelnur added a comment -

          Hi Chris Douglas,

          I would prefer to keep the current MR job test because it test spills/merges on both sides of the MR job making sure no edge cases are not covered.

          The ShuffleHandler is a private class of MapReduce, if other frameworks use it, it is at their own risk.

          Regarding adding new abstractions, I’m OK if they are small and non-intrusive. I just don’t want to send Arun chasing a goose a wild goose and when he finally does we backtrack because the changes are too pervasive in the core of MapReduce (this happened in MAPREDUCE-2454).

          Show
          Alejandro Abdelnur added a comment - Hi Chris Douglas , I would prefer to keep the current MR job test because it test spills/merges on both sides of the MR job making sure no edge cases are not covered. The ShuffleHandler is a private class of MapReduce, if other frameworks use it, it is at their own risk. Regarding adding new abstractions, I’m OK if they are small and non-intrusive. I just don’t want to send Arun chasing a goose a wild goose and when he finally does we backtrack because the changes are too pervasive in the core of MapReduce (this happened in MAPREDUCE-2454 ).
          Hide
          Chris Douglas added a comment -

          The repeated config lookup and unit test are not blockers, but they're places where the patch could be improved.

          The ShuffleHandler is a private class of MapReduce, if other frameworks use it, it is at their own risk.

          Every version of the patch has broken compatibility with existing versions of MapReduce. Other frameworks may rely on functionality we don't guarantee, but breaking them is avoidable.

          Regarding adding new abstractions, I’m OK if they are small and non-intrusive. I just don’t want to send Arun chasing a goose a wild goose and when he finally does we backtrack because the changes are too pervasive in the core of MapReduce

          Adding a new file just to pass 16 bytes to the ShuffleHandler will harm performance; breaking backwards compatibility is not OK, and not necessary for this feature. Aside from those, I've asked for some formatting fixes and that the code not return an IV that doesn't match the hard-coded 16-byte size. These are reasonable, limited requests and bug fixes, and I've suggested two possible implementations that would address them. These would be blockers during the merge, too.

          Show
          Chris Douglas added a comment - The repeated config lookup and unit test are not blockers, but they're places where the patch could be improved. The ShuffleHandler is a private class of MapReduce, if other frameworks use it, it is at their own risk. Every version of the patch has broken compatibility with existing versions of MapReduce . Other frameworks may rely on functionality we don't guarantee, but breaking them is avoidable. Regarding adding new abstractions, I’m OK if they are small and non-intrusive. I just don’t want to send Arun chasing a goose a wild goose and when he finally does we backtrack because the changes are too pervasive in the core of MapReduce Adding a new file just to pass 16 bytes to the ShuffleHandler will harm performance; breaking backwards compatibility is not OK, and not necessary for this feature. Aside from those, I've asked for some formatting fixes and that the code not return an IV that doesn't match the hard-coded 16-byte size. These are reasonable, limited requests and bug fixes, and I've suggested two possible implementations that would address them. These would be blockers during the merge, too.
          Hide
          Alejandro Abdelnur added a comment -

          Chris Douglas, on the last section of the previous comment. I didn't mean to say your refactoring asks are a wild goose, I just wanted to say I don't want to end up on that situation. My apologies if I've given the wrong impression with my comment. I've talked with Arun and he is already exploring along the lines of your suggestions to see their feasibility.

          Show
          Alejandro Abdelnur added a comment - Chris Douglas , on the last section of the previous comment. I didn't mean to say your refactoring asks are a wild goose, I just wanted to say I don't want to end up on that situation. My apologies if I've given the wrong impression with my comment. I've talked with Arun and he is already exploring along the lines of your suggestions to see their feasibility.
          Hide
          Arun Suresh added a comment -

          Hi Chris Douglas
          Thank you for the feedback. Updating the patch to address most of your nits :

          There are many counterexamples, but running a MR job is a heavy way to test this

          Agreed. But a MR Job will ensure all code paths are handled. I will be adding testcases to existing classs (for eg. TestMerger) to validate that the merging works fine with Shuffle turned on. But I don't see too many tests cases to validate that mapOutput spillfiles are correctly being partitioned and sent to the correct reduces.

          Has this been tested on spills with intermediate merges? With more than a single reduce? Looking at the patch, it looks like it creates the stream with the IV, it doesn't reset the IV for each segment (apologies, I haven't tried applying it, so I might just be misreading the context).

          Modifying the TestMerger class to use the CryptoShuffle will ensure the former. The current Test case Included with the patch tests with multiple reducers.. I will refactor it a bit to explicitly test these scenarios

          To make it backwards compatible, the IV can be part of each IFile segment (requiring no changes to ShuffleHandler or the SpillRecord/IndexRecord format), or the IVs can be added to the end of the SpillRecord. In the latter case, the Fetcher will need to request that the alternate interpretation by including a header; old versions will get the existing interpretation of the SpillRecord.

          As per your suggestion, I was actually able to get the end to end flow working without having to touch ShuffleHandler, ShuffleHeader or IndexRecord. Although, what I did was add the IV to the prefix of an IFile before it is written.. and during Segment::init() when it is read from disk. Only nit is I have to do some amount of book-keeping on the MapTask and Fetcher to add/remove the 16 bytes.

          Since the IV size is hard-coded in CryptoUtils to 16 bytes (and part of the IndexRecord format), it should probably fail if the CryptoCodec::getAlgorithmBlockSize returns anything else.

          Yup.. this would have been an issue had I had to modify the IndexRecord/ShuffleHeader. But now we don't, so this is not an issue anymore

          Show
          Arun Suresh added a comment - Hi Chris Douglas Thank you for the feedback. Updating the patch to address most of your nits : There are many counterexamples, but running a MR job is a heavy way to test this Agreed. But a MR Job will ensure all code paths are handled. I will be adding testcases to existing classs (for eg. TestMerger) to validate that the merging works fine with Shuffle turned on. But I don't see too many tests cases to validate that mapOutput spillfiles are correctly being partitioned and sent to the correct reduces. Has this been tested on spills with intermediate merges? With more than a single reduce? Looking at the patch, it looks like it creates the stream with the IV, it doesn't reset the IV for each segment (apologies, I haven't tried applying it, so I might just be misreading the context). Modifying the TestMerger class to use the CryptoShuffle will ensure the former. The current Test case Included with the patch tests with multiple reducers.. I will refactor it a bit to explicitly test these scenarios To make it backwards compatible, the IV can be part of each IFile segment (requiring no changes to ShuffleHandler or the SpillRecord/IndexRecord format), or the IVs can be added to the end of the SpillRecord. In the latter case, the Fetcher will need to request that the alternate interpretation by including a header; old versions will get the existing interpretation of the SpillRecord. As per your suggestion, I was actually able to get the end to end flow working without having to touch ShuffleHandler , ShuffleHeader or IndexRecord . Although, what I did was add the IV to the prefix of an IFile before it is written.. and during Segment::init() when it is read from disk. Only nit is I have to do some amount of book-keeping on the MapTask and Fetcher to add/remove the 16 bytes. Since the IV size is hard-coded in CryptoUtils to 16 bytes (and part of the IndexRecord format), it should probably fail if the CryptoCodec::getAlgorithmBlockSize returns anything else. Yup.. this would have been an issue had I had to modify the IndexRecord / ShuffleHeader . But now we don't, so this is not an issue anymore
          Hide
          Arun Suresh added a comment -

          Updating Patch..
          Added an additional testcase TestMRIntermediateDataEncryption::testMapSideMerge

          Show
          Arun Suresh added a comment - Updating Patch.. Added an additional testcase TestMRIntermediateDataEncryption::testMapSideMerge
          Hide
          Arun Suresh added a comment -

          Looks like the latest patch was not applying... updating with new patch..

          Show
          Arun Suresh added a comment - Looks like the latest patch was not applying... updating with new patch..
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Sorry for jumping in late on this one.

          For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted.

          Not sure why this is a requirement. I can understand the requirement for encryption it on the wire, but on disk the intermediate files are already secure, readable only by users who need to read them and more importantly all this intermediate data is very transitory.

          .. writing a separate file for every spill just to store a few bytes of IV doesn't seem like a reasonable tradeoff in either performance or complexity.

          I too echo the same sentiment.

          If this really is a requirement, aren't we better off asking cluster admins to either install disks with local file-systems that support encryption specifically for intermediate data or just create some partitions that support encryption? That seems like the right layer to handle something like this instead of adding a whole lot of complexity into the software that only has a downside of performance.

          Wearing my YARN hat, it is not enough to do this just for MapReduce. Every other framework running on YARN will need to add this complexity - this is asking for too much complexity. We are better off handling it at the file-system/partition/disk level.

          Show
          Vinod Kumar Vavilapalli added a comment - Sorry for jumping in late on this one. For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. Not sure why this is a requirement. I can understand the requirement for encryption it on the wire, but on disk the intermediate files are already secure, readable only by users who need to read them and more importantly all this intermediate data is very transitory . .. writing a separate file for every spill just to store a few bytes of IV doesn't seem like a reasonable tradeoff in either performance or complexity. I too echo the same sentiment. If this really is a requirement, aren't we better off asking cluster admins to either install disks with local file-systems that support encryption specifically for intermediate data or just create some partitions that support encryption? That seems like the right layer to handle something like this instead of adding a whole lot of complexity into the software that only has a downside of performance. Wearing my YARN hat, it is not enough to do this just for MapReduce. Every other framework running on YARN will need to add this complexity - this is asking for too much complexity. We are better off handling it at the file-system/partition/disk level.
          Hide
          Alejandro Abdelnur added a comment -

          Fetcher.java
          MapTask.java
          MergerManagerImpl.java
          Merger.java
          ShuffleHandler.java
          ShuffleHeader.java

          • several space changes (configure your editor not to trim unmodified lines

          CryptoUtils.java

          • createIV(): javadocs, invalid params
          • wrap() OUT/IN methods: any change to consolidate all/most signatures to delegate to a single one doing the repetitive logic?
          • a couple wrap() methods have a funny LOG message ####
          • wrap() OUT methods use cc.AlgorithmBlockSize(), but wrap() IN methods use 16, for IN methods you can use the cc already avail in the method.
          • wrap() methods wrap if necessary (the IF ENCRYTPED has been moved inside), the name should reflect that, maybe something like 'wrapIfNecessary()'

          Fetcher.java

          • copyMapOutput() is unconditionally correct the offset, this seems wrong.
          • No need to define out2, just reuse out
          Show
          Alejandro Abdelnur added a comment - Fetcher.java MapTask.java MergerManagerImpl.java Merger.java ShuffleHandler.java ShuffleHeader.java several space changes (configure your editor not to trim unmodified lines CryptoUtils.java createIV(): javadocs, invalid params wrap() OUT/IN methods: any change to consolidate all/most signatures to delegate to a single one doing the repetitive logic? a couple wrap() methods have a funny LOG message #### wrap() OUT methods use cc.AlgorithmBlockSize(), but wrap() IN methods use 16, for IN methods you can use the cc already avail in the method. wrap() methods wrap if necessary (the IF ENCRYTPED has been moved inside), the name should reflect that, maybe something like 'wrapIfNecessary()' Fetcher.java copyMapOutput() is unconditionally correct the offset, this seems wrong. No need to define out2, just reuse out
          Hide
          Arun Suresh added a comment -

          Rebasing due to API changes caused by HADOOP-10713

          Show
          Arun Suresh added a comment - Rebasing due to API changes caused by HADOOP-10713
          Hide
          Alejandro Abdelnur added a comment -

          If this really is a requirement, aren't we better off asking cluster admins to either install disks with local file-systems that support encryption specifically for intermediate data or just create some partitions that support encryption? That seems like the right layer to handle something like this instead of adding a whole lot of complexity into the software that only has a downside of performance.

          Asking to install additional soft to encrypt local FS means installing Kernel modules.

          Also, this would mean that ALL MR jobs are going to pay the penalty of encrypted intermediate data. That is not reasonable.

          I don't agree on the statement that this is "adding a lot of complexity", it is simply wrapping the streams where necessary.

          Wearing my YARN hat, it is not enough to do this just for MapReduce. Every other framework running on YARN will need to add this complexity - this is asking for too much complexity. We are better off handling it at the file-system/partition/disk level.

          This patch is not touching anything in Yarn, but in MapReduce, private/evolving classes of it.

          Show
          Alejandro Abdelnur added a comment - If this really is a requirement, aren't we better off asking cluster admins to either install disks with local file-systems that support encryption specifically for intermediate data or just create some partitions that support encryption? That seems like the right layer to handle something like this instead of adding a whole lot of complexity into the software that only has a downside of performance. Asking to install additional soft to encrypt local FS means installing Kernel modules. Also, this would mean that ALL MR jobs are going to pay the penalty of encrypted intermediate data. That is not reasonable. I don't agree on the statement that this is "adding a lot of complexity", it is simply wrapping the streams where necessary. Wearing my YARN hat, it is not enough to do this just for MapReduce. Every other framework running on YARN will need to add this complexity - this is asking for too much complexity. We are better off handling it at the file-system/partition/disk level. This patch is not touching anything in Yarn, but in MapReduce, private/evolving classes of it.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Asking to install additional soft to encrypt local FS means installing Kernel modules.

          Agreed, I wasn't trying to make it look a simple step.

          Also, this would mean that ALL MR jobs are going to pay the penalty of encrypted intermediate data. That is not reasonable.

          That's a fair argument, but see below..

          I don't agree on the statement that this is "adding a lot of complexity", it is simply wrapping the streams where necessary.

          This patch is not touching anything in Yarn, but in MapReduce, private/evolving classes of it.

          I didn't mean that we are touching YARN here. I meant that we will have to keep implementing this for every framework that runs on YARN - MR, YARN, Storm, Spark, HBase and the list goes on an on. I am trying to trade off that complexity in software with an admin prerequisite to install one or few disks/partitions that selective users can chose to use via their job-configuration.

          Show
          Vinod Kumar Vavilapalli added a comment - Asking to install additional soft to encrypt local FS means installing Kernel modules. Agreed, I wasn't trying to make it look a simple step. Also, this would mean that ALL MR jobs are going to pay the penalty of encrypted intermediate data. That is not reasonable. That's a fair argument, but see below.. I don't agree on the statement that this is "adding a lot of complexity", it is simply wrapping the streams where necessary. This patch is not touching anything in Yarn, but in MapReduce, private/evolving classes of it. I didn't mean that we are touching YARN here. I meant that we will have to keep implementing this for every framework that runs on YARN - MR, YARN, Storm, Spark, HBase and the list goes on an on. I am trying to trade off that complexity in software with an admin prerequisite to install one or few disks/partitions that selective users can chose to use via their job-configuration.
          Hide
          Arun Suresh added a comment -

          Alejandro Abdelnur
          Attaching a 'scrub' patch (removing the new-line issues) for review..
          Will be updating with a final patch once I address all your feedback comments.

          Show
          Arun Suresh added a comment - Alejandro Abdelnur Attaching a 'scrub' patch (removing the new-line issues) for review.. Will be updating with a final patch once I address all your feedback comments.
          Hide
          Chris Douglas added a comment -

          I am trying to trade off that complexity in software with an admin prerequisite to install one or few disks/partitions that selective users can chose to use via their job-configuration.

          This would work also, but (Alejandro/Arun, correct me if this is mistaken) encrypted intermediate data is probably motivated by compliance regimes that require it. An audit would need to verify that every job used the encrypted local dirs, that those mounts were configured to encrypt when the job ran, etc. One would also need to do capacity planning for encrypted vs unencrypted space across nodes, possibly even federating jobs. It's workable, but kind of ad hoc. In contrast, verifying that the MR job set this switch is straightforward and has no ops overhead. I have no idea whether it's common to combine these workloads, but this would make it easier.

          It's not so inconsistent to add this to MapReduce... frameworks are currently responsible for intra-application security, particularly RPC. If there's a general mechanism then this should use it. If that layer were developed, we'd want MapReduce to use it instead of its own, custom encryption. Today, the alternative is to develop that general-purpose layer.

          To reduce the overhead, this could use the plugin mechanism in MAPREDUCE-2454 because this no longer requires any changes to the ShuffleHandler or index formats. I haven't looked at the latest patch, but if the IFile format omits the 16 byte IV for each spill, then the only overhead it's adding is for the checks in the config (most of which can be pulled into the buffer init and cached).

          Has this been tested in a cluster? Would the perf hit be simple to measure?

          Show
          Chris Douglas added a comment - I am trying to trade off that complexity in software with an admin prerequisite to install one or few disks/partitions that selective users can chose to use via their job-configuration. This would work also, but (Alejandro/Arun, correct me if this is mistaken) encrypted intermediate data is probably motivated by compliance regimes that require it. An audit would need to verify that every job used the encrypted local dirs, that those mounts were configured to encrypt when the job ran, etc. One would also need to do capacity planning for encrypted vs unencrypted space across nodes, possibly even federating jobs. It's workable, but kind of ad hoc. In contrast, verifying that the MR job set this switch is straightforward and has no ops overhead. I have no idea whether it's common to combine these workloads, but this would make it easier. It's not so inconsistent to add this to MapReduce... frameworks are currently responsible for intra-application security, particularly RPC. If there's a general mechanism then this should use it. If that layer were developed, we'd want MapReduce to use it instead of its own, custom encryption. Today, the alternative is to develop that general-purpose layer. To reduce the overhead, this could use the plugin mechanism in MAPREDUCE-2454 because this no longer requires any changes to the ShuffleHandler or index formats. I haven't looked at the latest patch, but if the IFile format omits the 16 byte IV for each spill, then the only overhead it's adding is for the checks in the config (most of which can be pulled into the buffer init and cached). Has this been tested in a cluster? Would the perf hit be simple to measure?
          Hide
          Arun Suresh added a comment -

          Updating patch to address all the feedback. Thanks !!

          Alejandro Abdelnur,

          copyMapOutput() is unconditionally correct the offset, this seems wrong

          wrt, to the Fetcher::copyMapOutput() unconditionally setting the offset. It shouldn't be a problem not, since in the latest patch, there is no offset sent anymore

          No need to define out2, just reuse out

          We still do require out2 (I have since renamed the variable). Since I still need a reference of the original 'out'. I can then wrap the original 'out' for each partition of the SpillFile (and thereby prefix the IV and offset at the beginning of EACH partition of the spill file). This will ensure that I won't have to send either the stream offset or the IV via the ShuffleHandler/ShuffleHeader

          Chris Douglas
          To address your concerns about backward compatibility, Since we don't touch the ShuffleHandler in the latest patch (Think in my previous patch, I still had the offset sent in the shuffle header.. that's been moved out now).. it should be fine.

          but if the IFile format omits the 16 byte IV for each spill, then the only overhead it's adding is for the checks in the config (most of which can be pulled into the buffer init and cached)

          Just to clarify, we are now sending 24 bytes (16 for the IV an 8 for the long offset)

          Show
          Arun Suresh added a comment - Updating patch to address all the feedback. Thanks !! Alejandro Abdelnur , copyMapOutput() is unconditionally correct the offset, this seems wrong wrt, to the Fetcher::copyMapOutput() unconditionally setting the offset. It shouldn't be a problem not, since in the latest patch, there is no offset sent anymore No need to define out2, just reuse out We still do require out2 (I have since renamed the variable). Since I still need a reference of the original 'out'. I can then wrap the original 'out' for each partition of the SpillFile (and thereby prefix the IV and offset at the beginning of EACH partition of the spill file). This will ensure that I won't have to send either the stream offset or the IV via the ShuffleHandler / ShuffleHeader Chris Douglas To address your concerns about backward compatibility, Since we don't touch the ShuffleHandler in the latest patch (Think in my previous patch, I still had the offset sent in the shuffle header.. that's been moved out now).. it should be fine. but if the IFile format omits the 16 byte IV for each spill, then the only overhead it's adding is for the checks in the config (most of which can be pulled into the buffer init and cached) Just to clarify, we are now sending 24 bytes (16 for the IV an 8 for the long offset)
          Hide
          Arun Suresh added a comment -

          Updating patch..
          Made some formatting / variable name changes.

          Performed some backward compatibility testing :
          Tested on a cluster with 1/2 the Nodes running with the patch and 1/2 that aren't. Set the Encryption flag to OFF and ran some jobs that utilized the whole cluster. Ensured that there weren't any failures.

          Chris Douglas,

          Has this been tested in a cluster? Would the perf hit be simple to measure?

          So given that the extra bits (IV and offset) are sent only if Encryption is turned ON, I ran some basic terasort tests and I could not find any perceptible difference in performance. But I guess there are various variables that can the tuned during testing. For e.g., I can play around with mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent to vary the number of spills/on-disk merges. So, to answer your question, don't think it is easy to measure the performance hit.

          Show
          Arun Suresh added a comment - Updating patch.. Made some formatting / variable name changes. Performed some backward compatibility testing : Tested on a cluster with 1/2 the Nodes running with the patch and 1/2 that aren't. Set the Encryption flag to OFF and ran some jobs that utilized the whole cluster. Ensured that there weren't any failures. Chris Douglas , Has this been tested in a cluster? Would the perf hit be simple to measure? So given that the extra bits (IV and offset) are sent only if Encryption is turned ON, I ran some basic terasort tests and I could not find any perceptible difference in performance. But I guess there are various variables that can the tuned during testing. For e.g., I can play around with mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent to vary the number of spills/on-disk merges. So, to answer your question, don't think it is easy to measure the performance hit.
          Hide
          Alejandro Abdelnur added a comment -

          On the performance hit, if encryption is OFF I would say it is NILL (the only extra thing being don is resolving a boolean config to check if encryption is ON or OFF). if encryption is ON, you are hitting the encryption/decryption overhead. Doing prelimiaries encrytion benchmarks with the crypto streams using Diceros (CryptoCodec->JCE->JNI->OpenSSL) I've got >1000MB/sec both on encrypt/decrypt on my laptop. Once we have HADOOP-10693 and this JIRA, will be able to do some end to end benchmarks.

          Show
          Alejandro Abdelnur added a comment - On the performance hit, if encryption is OFF I would say it is NILL (the only extra thing being don is resolving a boolean config to check if encryption is ON or OFF). if encryption is ON, you are hitting the encryption/decryption overhead. Doing prelimiaries encrytion benchmarks with the crypto streams using Diceros (CryptoCodec->JCE->JNI->OpenSSL) I've got >1000MB/sec both on encrypt/decrypt on my laptop. Once we have HADOOP-10693 and this JIRA, will be able to do some end to end benchmarks.
          Hide
          Arun Suresh added a comment -

          Ran test-patch locally.
          Results :

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.
          See /artifact/trunk/patchprocess/diffJavadocWarnings.txt for details.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          -1 release audit. The applied patch generated 3 release audit warnings.

          Show
          Arun Suresh added a comment - Ran test-patch locally. Results : -1 overall . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. See /artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version ) warnings. -1 release audit . The applied patch generated 3 release audit warnings.
          Hide
          Arun Suresh added a comment -

          Updating patch to fix javadoc warnings..

          Show
          Arun Suresh added a comment - Updating patch to fix javadoc warnings..
          Hide
          Arun Suresh added a comment -

          Re-ran test-patch locally.
          Results:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          Show
          Arun Suresh added a comment - Re-ran test-patch locally. Results: -1 overall . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version ) warnings. -1 release audit . The applied patch generated 1 release audit warnings.
          Hide
          Arun Suresh added a comment -

          The -1 on the release audit is expected and caused due to the CHANGES-fs-encryption.txt file having no license header.. This file is not related to this patch.. And should go away once merged..

          Output of patchReleaseAuditProblems.txt :

           !????? /home/asuresh/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/CHANGES-fs-encryption.txt
          Lines that start with ????? in the release audit report indicate files that do not have an Apache license header.
          
          Show
          Arun Suresh added a comment - The -1 on the release audit is expected and caused due to the CHANGES-fs-encryption.txt file having no license header.. This file is not related to this patch.. And should go away once merged.. Output of patchReleaseAuditProblems.txt : !????? /home/asuresh/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/CHANGES-fs-encryption.txt Lines that start with ????? in the release audit report indicate files that do not have an Apache license header.
          Hide
          Alejandro Abdelnur added a comment - - edited

          LGTM.

          One minor nit (i can take care of it when committing), in JObSubmitter.java#copyAndConfigureFiles() javadoc, line 295 no needed change.

          Chris Douglas, I believe all your suggestions/concerns have been addressed. Do you want to do a new pass on the patch?

          I'll wait a few days to commit.

          Show
          Alejandro Abdelnur added a comment - - edited LGTM. One minor nit (i can take care of it when committing), in JObSubmitter.java#copyAndConfigureFiles() javadoc, line 295 no needed change. Chris Douglas , I believe all your suggestions/concerns have been addressed. Do you want to do a new pass on the patch? I'll wait a few days to commit.
          Hide
          Chris Douglas added a comment -

          The current patch still injects the IV and length into the stream, then fixes up the offsets. If the IV were part of the IFile format, then this would not be necessary. If this format were ever changed, then someone would need to go back and fix all this arithmetic or take its framing as a requirement for any intermediate data format.

          Am I missing why it's easier to wrap/unwrap streams?

          Show
          Chris Douglas added a comment - The current patch still injects the IV and length into the stream, then fixes up the offsets. If the IV were part of the IFile format, then this would not be necessary. If this format were ever changed, then someone would need to go back and fix all this arithmetic or take its framing as a requirement for any intermediate data format. Am I missing why it's easier to wrap/unwrap streams?
          Hide
          Arun Suresh added a comment - - edited

          Chris Douglas,
          I had initially tried to directly modify the IFile format to handle the iv. The reason I felt this would not be such a clean solution is :

          • The IFile currently does not have a notion of an explicit header/metadata.
          • While it is possible to use the IFile.Writer constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the IFile.Reader (InMemoryReader and RawKVIteratorReader). The InMemoryReader totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the Segment::init() method (which in my opinion makes the IFile abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the Segment class (which is part of the Merger framework) should avoid dealing with the internals of the ).
          • Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of MapOutput::shuffle method creates IFileInputStreams directly without an associated IFile.Reader
          Show
          Arun Suresh added a comment - - edited Chris Douglas , I had initially tried to directly modify the IFile format to handle the iv. The reason I felt this would not be such a clean solution is : The IFile currently does not have a notion of an explicit header/metadata. While it is possible to use the IFile.Writer constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the IFile.Reader ( InMemoryReader and RawKVIteratorReader ). The InMemoryReader totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the Segment::init() method (which in my opinion makes the IFile abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the Segment class (which is part of the Merger framework) should avoid dealing with the internals of the ). Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of MapOutput::shuffle method creates IFileInputStreams directly without an associated IFile.Reader
          Hide
          Chris Douglas added a comment -

          OK... untangling the abstractions can be deferred. The current patch spreads the feature across the code in a way that's not ideal to maintain, but it addresses all the functional feedback by moving the IV inline.

          Thanks Arun Suresh for all the iterations on this.

          Show
          Chris Douglas added a comment - OK... untangling the abstractions can be deferred. The current patch spreads the feature across the code in a way that's not ideal to maintain, but it addresses all the functional feedback by moving the IV inline. Thanks Arun Suresh for all the iterations on this.
          Hide
          Alejandro Abdelnur added a comment -

          Chris Douglas, thanks for the detailed feedback/review iterations on this. Does this means you are OK with committing the current patch?

          Show
          Alejandro Abdelnur added a comment - Chris Douglas , thanks for the detailed feedback/review iterations on this. Does this means you are OK with committing the current patch?
          Hide
          Chris Douglas added a comment -

          Yes, I'm OK with the current patch. This approach won't scale to another feature, but it can be preserved in a refactoring.

          My only remaining ask (fine to add during commit) is that CryptoUtils be annotated with @Private and @Unstable, so it's clearly marked as an implementation detail. If it could be package-private that would be even better, though I haven't checked to see if there's anything else in the o.a.h.mapreduce.task.crypto package.

          Show
          Chris Douglas added a comment - Yes, I'm OK with the current patch. This approach won't scale to another feature, but it can be preserved in a refactoring. My only remaining ask (fine to add during commit) is that CryptoUtils be annotated with @Private and @Unstable , so it's clearly marked as an implementation detail. If it could be package-private that would be even better, though I haven't checked to see if there's anything else in the o.a.h.mapreduce.task.crypto package.
          Hide
          Arun Suresh added a comment -

          Uploaded updated patch.. Thanks Chris Douglas for all the feedback !!
          I've maked the class Private and Unstable but can't make the class itself package protected since it exposes public static methods used in a number of places..

          Show
          Arun Suresh added a comment - Uploaded updated patch.. Thanks Chris Douglas for all the feedback !! I've maked the class Private and Unstable but can't make the class itself package protected since it exposes public static methods used in a number of places..
          Hide
          Chris Douglas added a comment -

          Sorry, I meant that if o.a.h.mapreduce.task.crypto only has CryptoUtils in it, then maybe the new package isn't necessary.

          Show
          Chris Douglas added a comment - Sorry, I meant that if o.a.h.mapreduce.task.crypto only has CryptoUtils in it, then maybe the new package isn't necessary.
          Hide
          Arun Suresh added a comment -

          Chris Douglas,
          So, if our objective is to have CryptoUtils contained, and since it is currently being used only by the mapped framework, I'd prefer it remains where it is : in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/o/a/h/mapreduce/task/crypto/
          else It looks like the only other place it fits in is ./hadoop-common-project/hadoop-common/src/main/java/o/a/h/crypto/ in which case, it will look like a more generic utility and we will invite more people using it before it becomes stable.

          I'd rather move it over once it stabilizes

          Show
          Arun Suresh added a comment - Chris Douglas , So, if our objective is to have CryptoUtils contained, and since it is currently being used only by the mapped framework, I'd prefer it remains where it is : in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/o/a/h/mapreduce/task/crypto/ else It looks like the only other place it fits in is ./hadoop-common-project/hadoop-common/src/main/java/o/a/h/crypto/ in which case, it will look like a more generic utility and we will invite more people using it before it becomes stable. I'd rather move it over once it stabilizes
          Hide
          Chris Douglas added a comment -

          I was thinking o.a.h.mapred, with other internal classes.

          Show
          Chris Douglas added a comment - I was thinking o.a.h.mapred , with other internal classes.
          Hide
          Arun Suresh added a comment -

          Ok.. Uploaded patch with CryptoUtils moved. I still can't make it package protected though, since some code that uses it is in o.a.h.mapreduce and some in o.a.h.mapred

          Show
          Arun Suresh added a comment - Ok.. Uploaded patch with CryptoUtils moved. I still can't make it package protected though, since some code that uses it is in o.a.h.mapreduce and some in o.a.h.mapred
          Hide
          Arun Suresh added a comment -

          Updated with testcase fix

          Show
          Arun Suresh added a comment - Updated with testcase fix
          Hide
          Alejandro Abdelnur added a comment -

          I've just committed this JIRA to fs-encryption branch.

          Chris Douglas, thanks for all the review cycles you spent on this.

          Arun Suresh, thanks for persevering until done, nice job.

          Show
          Alejandro Abdelnur added a comment - I've just committed this JIRA to fs-encryption branch. Chris Douglas , thanks for all the review cycles you spent on this. Arun Suresh , thanks for persevering until done, nice job.
          Hide
          Chris Douglas added a comment -

          Yes; thanks Arun Suresh for your patience in seeing this through.

          Show
          Chris Douglas added a comment - Yes; thanks Arun Suresh for your patience in seeing this through.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Closing old tickets that are already part of a release.

          Show
          Vinod Kumar Vavilapalli added a comment - Closing old tickets that are already part of a release.

            People

            • Assignee:
              Arun Suresh
              Reporter:
              Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development