Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-14

Add column level encryption to ORC files

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331.

        Issue Links

          Activity

          Hide
          supun Supun Kamburugamuva added a comment -

          I'm a computer science graduate student in Indiana University, Bloomington and my research areas are in distributed computing. I'm also a committer to few Apache projects. I'm new to Hadoop, Hive and I would like to learn and contribute to these projects. It would be great if you can let me know the areas that I should be looking to get started.

          Regards,
          Supun Kamburugamuva

          Show
          supun Supun Kamburugamuva added a comment - I'm a computer science graduate student in Indiana University, Bloomington and my research areas are in distributed computing. I'm also a committer to few Apache projects. I'm new to Hadoop, Hive and I would like to learn and contribute to these projects. It would be great if you can let me know the areas that I should be looking to get started. Regards, Supun Kamburugamuva
          Hide
          owen.omalley Owen O'Malley added a comment -

          Supun,
          I've tagged this for Google Summer of Code. Take a look at:
          http://www.google-melange.com/gsoc/homepage/google/gsoc2013

          Show
          owen.omalley Owen O'Malley added a comment - Supun, I've tagged this for Google Summer of Code. Take a look at: http://www.google-melange.com/gsoc/homepage/google/gsoc2013
          Hide
          apurtell Andrew Purtell added a comment -

          So do you envision this as using the facilities provided by HADOOP-9331?

          Show
          apurtell Andrew Purtell added a comment - So do you envision this as using the facilities provided by HADOOP-9331 ?
          Hide
          owen.omalley Owen O'Malley added a comment -

          Andrew,
          Yes if the code is available and provides the right API.

          Show
          owen.omalley Owen O'Malley added a comment - Andrew, Yes if the code is available and provides the right API.
          Hide
          apurtell Andrew Purtell added a comment -

          Yes if the code is available and provides the right API.

          Owen O'Malley HADOOP-9331 proposes and provides an API, and HADOOP-9332 provides a codec implementing support for AES with optional hardware acceleration. This seems like an ideal use case for using both. Should you have any proposed improvements to the API please don’t hesitate to raise them on HADOOP-9331, where they will be promptly addressed. Likewise with the AES codec, please don’t hesitate to raise those on HADOOP-9332.

          Show
          apurtell Andrew Purtell added a comment - Yes if the code is available and provides the right API. Owen O'Malley HADOOP-9331 proposes and provides an API, and HADOOP-9332 provides a codec implementing support for AES with optional hardware acceleration. This seems like an ideal use case for using both. Should you have any proposed improvements to the API please don’t hesitate to raise them on HADOOP-9331 , where they will be promptly addressed. Likewise with the AES codec, please don’t hesitate to raise those on HADOOP-9332 .
          Hide
          lmccay Larry McCay added a comment -

          I am in the process of reworking the patch for HADOOP-9534 Credential Management Framework in order to support accessing keying material for this issue. Current thinking is that CMF can abstract the source of keys and be leveraged across a number of different crypto and password protection usecases in the Hadoop ecosystem. This is why it is being done in Hadoop rather than Hive. We will want to also align it's use with HADOOP-9331 - since 9331 will be leveraged in here as well as for the cryptoFS, etc.

          Will provide a description of the DDL/metastore and column store changes that will be needed to support the column level encryption once I have it written up.

          Show
          lmccay Larry McCay added a comment - I am in the process of reworking the patch for HADOOP-9534 Credential Management Framework in order to support accessing keying material for this issue. Current thinking is that CMF can abstract the source of keys and be leveraged across a number of different crypto and password protection usecases in the Hadoop ecosystem. This is why it is being done in Hadoop rather than Hive. We will want to also align it's use with HADOOP-9331 - since 9331 will be leveraged in here as well as for the cryptoFS, etc. Will provide a description of the DDL/metastore and column store changes that will be needed to support the column level encryption once I have it written up.
          Hide
          lmccay Larry McCay added a comment -

          CMF will be used to access keying material for column level encryption.

          Show
          lmccay Larry McCay added a comment - CMF will be used to access keying material for column level encryption.
          Hide
          owen.omalley Owen O'Malley added a comment -

          I've started working on this. I'll post a patch this week.

          Show
          owen.omalley Owen O'Malley added a comment - I've started working on this. I'll post a patch this week.

            People

            • Assignee:
              owen.omalley Owen O'Malley
              Reporter:
              owen.omalley Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:

                Development