Pig
  1. Pig
  2. PIG-987

[zebra] Zebra Column Group Access Control

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None

      Description

      Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored.

      Expected behavior when column group permissions are set:

      When user selects only columns that they do not have permissions to access, Zebra should return error with message "Error #: Permission denied for accessing column <column name or names>

      Access control applies to an entire column group, so all columns in a column group have same permissions.

      1. ColumnGroupSecurity.patch
        104 kB
        Yan Zhou
      2. ColumnGroupSecurity.patch
        104 kB
        Yan Zhou
      3. ColumnGroupSecurity.patch
        104 kB
        Yan Zhou
      4. TEST-org.apache.hadoop.zebra.io.TestCheckin.txt
        30 kB
        Raghu Angadi
      5. TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt
        2 kB
        Raghu Angadi
      6. tmp-987-plus-991.patch
        114 kB
        Raghu Angadi

        Issue Links

          Activity

          Hide
          Yan Zhou added a comment -

          A29_ColumnGroupSecurity.patch is the patch file name.

          Show
          Yan Zhou added a comment - A29_ColumnGroupSecurity.patch is the patch file name.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12421038/A29_ColumnGroupSecurity.patch
          against trunk revision 820394.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 38 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 288 release audit warnings (more than the trunk's current 281 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421038/A29_ColumnGroupSecurity.patch against trunk revision 820394. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 38 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 288 release audit warnings (more than the trunk's current 281 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/54/console This message is automatically generated.
          Hide
          Yan Zhou added a comment -

          During STORE, the storage hint is enhanced to take a new "secure by" section, e.g.,

          [c1,c2] secure by group:secure perm:640

          meaning the column group of columns "c1" and "c2" will belong to group "secure" with file permission octal value of 0640 which, in turn, means read+write for user, read for group and non for others.

          After Zebra table creation, all files and directories inside the secured column group will have the same permision and group membership within the table.

          If a column group is not secured, the default behavoir is determined by the HADOOP MAP/REDUCE default permision and group membership set upon the new files and directories.

          Show
          Yan Zhou added a comment - During STORE, the storage hint is enhanced to take a new "secure by" section, e.g., [c1,c2] secure by group:secure perm:640 meaning the column group of columns "c1" and "c2" will belong to group "secure" with file permission octal value of 0640 which, in turn, means read+write for user, read for group and non for others. After Zebra table creation, all files and directories inside the secured column group will have the same permision and group membership within the table. If a column group is not secured, the default behavoir is determined by the HADOOP MAP/REDUCE default permision and group membership set upon the new files and directories.
          Hide
          Yan Zhou added a comment -

          The extra warnings were generated on 7 modified java files that were generated JAVACC code generator. Should be ignored.

          Show
          Yan Zhou added a comment - The extra warnings were generated on 7 modified java files that were generated JAVACC code generator. Should be ignored.
          Hide
          Gaurav Jain added a comment -

          Patch Reviewed

          +1

          Show
          Gaurav Jain added a comment - Patch Reviewed +1
          Hide
          Raghu Angadi added a comment -

          I tried to commit this patch. 'ant test' says all the tests fail, where as only one two tests fail without the patch.

          Does Hudson actual run Zebra tests?

          Show
          Raghu Angadi added a comment - I tried to commit this patch. 'ant test' says all the tests fail, where as only one two tests fail without the patch. Does Hudson actual run Zebra tests?
          Hide
          Yan Zhou added a comment -

          I checked Hudson test results and they do not seem to run Zebra.

          But I ran "ant test" in contrib/zebra directory and they passed. What errors did you get? I suspect some env issue at your end.

          Show
          Yan Zhou added a comment - I checked Hudson test results and they do not seem to run Zebra. But I ran "ant test" in contrib/zebra directory and they passed. What errors did you get? I suspect some env issue at your end.
          Hide
          Raghu Angadi added a comment -

          I am attaching mapred.TestCheckin.txt that passes without the patch.

          btw, not all tests pass even without the patch. What is the environment required? I did a fresh check out, and ran 'ant test'.

          I guess the tests failures on trunk are related to lzo. But I didn't expect more failures with the patch.

          Looks like PIG-991 removes the lzo dependency. I will try with that patch included.

          Show
          Raghu Angadi added a comment - I am attaching mapred.TestCheckin.txt that passes without the patch. btw, not all tests pass even without the patch. What is the environment required? I did a fresh check out, and ran 'ant test'. I guess the tests failures on trunk are related to lzo. But I didn't expect more failures with the patch. Looks like PIG-991 removes the lzo dependency. I will try with that patch included.
          Hide
          Chao Wang added a comment -

          I ran into the same issue also.

          I did a fresh checkout from apache trunk and ran "ant test", there are 14 test cases failed.

          Actually, they are caused by some incompatible exception type between pig and zebra. It seems pig already moved on with the change (IOException changed to IndexOutofBoundException), but zebra is behind a bit in this.

          Show
          Chao Wang added a comment - I ran into the same issue also. I did a fresh checkout from apache trunk and ran "ant test", there are 14 test cases failed. Actually, they are caused by some incompatible exception type between pig and zebra. It seems pig already moved on with the change (IOException changed to IndexOutofBoundException), but zebra is behind a bit in this.
          Hide
          Raghu Angadi added a comment -

          Not sure if this is related to PIG. When I applied PIG-991 over this, the tests passed (except the ones that fail on trunk).

          Show
          Raghu Angadi added a comment - Not sure if this is related to PIG. When I applied PIG-991 over this, the tests passed (except the ones that fail on trunk).
          Hide
          Yan Zhou added a comment -

          It's because this patch expose the env problem using lzo as compression that 991 eventually fixes.

          Can you commit 991's patch along with this? What are tthe failures from trunk? What are the error messages?

          Show
          Yan Zhou added a comment - It's because this patch expose the env problem using lzo as compression that 991 eventually fixes. Can you commit 991's patch along with this? What are tthe failures from trunk? What are the error messages?
          Hide
          Raghu Angadi added a comment -

          Even with PIG-991 included, I am seeing lzo related failures. Could you run tests on a clean checkout? If you didn't see the errors before then you probably have lzo set up in your environment, which is not a requirement.

          Show
          Raghu Angadi added a comment - Even with PIG-991 included, I am seeing lzo related failures. Could you run tests on a clean checkout? If you didn't see the errors before then you probably have lzo set up in your environment, which is not a requirement.
          Hide
          Yan Zhou added a comment -

          This patch has additional test scripts that do not use the nodefault lzo compression. Its application should be followed by the one in PIG-991 to pass all Zebra-related tests.

          Show
          Yan Zhou added a comment - This patch has additional test scripts that do not use the nodefault lzo compression. Its application should be followed by the one in PIG-991 to pass all Zebra-related tests.
          Hide
          Yan Zhou added a comment -

          I have attached a new patch that removes the use of lzo in 6 test scripts. Accordingly, patches of 2 "downstream" Jiras, PIG-986 and PIG-992, will also be updated; while the other three "downstream" patches, PIG-991, PIG-993 and PIG-944, need not to be changed.

          Show
          Yan Zhou added a comment - I have attached a new patch that removes the use of lzo in 6 test scripts. Accordingly, patches of 2 "downstream" Jiras, PIG-986 and PIG-992 , will also be updated; while the other three "downstream" patches, PIG-991 , PIG-993 and PIG-944 , need not to be changed.
          Hide
          Raghu Angadi added a comment -

          Attachments :

          1. tmp-987-plus-991.patch : latest patch here + patch for PIG-991
          2. TEST-org.apache.hadoop.zebra.io.TestCheckin.txt : output of the failed tests.

          Yan, looks like lzo related errors are fixed with the combined patch. But there are still some failures. I think some of these failures exist on trunk as well.

          Show
          Raghu Angadi added a comment - Attachments : tmp-987-plus-991.patch : latest patch here + patch for PIG-991 TEST-org.apache.hadoop.zebra.io.TestCheckin.txt : output of the failed tests. Yan, looks like lzo related errors are fixed with the combined patch. But there are still some failures. I think some of these failures exist on trunk as well.
          Hide
          Yan Zhou added a comment -

          I see the following errors in your attached log:

          chgrp: changing group of `/home/raghu/h/pig-commit/build/contrib/zebra/test/data/TestColumnGroupNullSplits': Operation not permitted

          So I believen your tests has encountered disk permission problems. Note that we are testing the feature of "column group security" so having property permission settings is necessary for the tests to pass.

          Show
          Yan Zhou added a comment - I see the following errors in your attached log: chgrp: changing group of `/home/raghu/h/pig-commit/build/contrib/zebra/test/data/TestColumnGroupNullSplits': Operation not permitted So I believen your tests has encountered disk permission problems. Note that we are testing the feature of "column group security" so having property permission settings is necessary for the tests to pass.
          Hide
          Raghu Angadi added a comment -

          I finally got some time look into this. Yes. I think the it should be fixed in the tests. TestColumnGroup.java says :

              ColumnGroup.Writer writer = new ColumnGroup.Writer(path, strSchema, sorted,
                  "pig", "gz", "gauravj", "users", (short) Short.parseShort("755", 8), false, conf);
          

          using local FS. How can we expect users to have a user name "gauravj" on their machines and run as superusers ? just can not be done.

          If the test wants to run with these permissions we should do :
          a) use HDFS (MiniDFSCluster) rather than local filesystem. The tester has all the permissions on a MiniDFS.
          b) minor : use a generic name than gauravj.

          Show
          Raghu Angadi added a comment - I finally got some time look into this. Yes. I think the it should be fixed in the tests. TestColumnGroup.java says : ColumnGroup.Writer writer = new ColumnGroup.Writer(path, strSchema, sorted, "pig", "gz", "gauravj", "users", (short) Short.parseShort("755", 8), false, conf); using local FS. How can we expect users to have a user name "gauravj" on their machines and run as superusers ? just can not be done. If the test wants to run with these permissions we should do : a) use HDFS (MiniDFSCluster) rather than local filesystem. The tester has all the permissions on a MiniDFS. b) minor : use a generic name than gauravj.
          Hide
          Yan Zhou added a comment -

          I don't think the owner name is a problem because in this release it has no effect at all.

          The log complains about "chgrp changing group ... is not permitted". Can you chgrp a local FS file to a group called "users" on your box?

          Show
          Yan Zhou added a comment - I don't think the owner name is a problem because in this release it has no effect at all. The log complains about "chgrp changing group ... is not permitted". Can you chgrp a local FS file to a group called "users" on your box?
          Hide
          Raghu Angadi added a comment -

          > Can you chgrp a local FS file to a group called "users" on your box?
          No.

          Its the same problem. I don't have a group called "users".. and I don't think we can require others to have it.

          I didn't know owner is ignored. It is still allowed by storage hint?

          Show
          Raghu Angadi added a comment - > Can you chgrp a local FS file to a group called "users" on your box? No. Its the same problem. I don't have a group called "users".. and I don't think we can require others to have it. I didn't know owner is ignored. It is still allowed by storage hint?
          Hide
          Yan Zhou added a comment -

          remove the hardcoded group name from a few test scripts. This patch and the ones in Pig-991 and Pig-986 are ready to be comitted. But please hold on commiting Pig-992 and afterwards.

          Show
          Yan Zhou added a comment - remove the hardcoded group name from a few test scripts. This patch and the ones in Pig-991 and Pig-986 are ready to be comitted. But please hold on commiting Pig-992 and afterwards.
          Hide
          Raghu Angadi added a comment -

          Thanks Yan. It might be better to remove gauravj also since it is ignored anyway.

          This implies column access control is not tested in this patch, right?

          Show
          Raghu Angadi added a comment - Thanks Yan. It might be better to remove gauravj also since it is ignored anyway. This implies column access control is not tested in this patch, right?
          Hide
          Raghu Angadi added a comment -

          I just committed this. Thanks Yan!

          Show
          Raghu Angadi added a comment - I just committed this. Thanks Yan!
          Hide
          Yan Zhou added a comment -

          Will remove gauravj .

          There is a TestColumnSecurity test script but it only works on a real cluster and is not part of checkin/nightly tests. I will add it in the next patch for Pig-986. Thanks.

          Show
          Yan Zhou added a comment - Will remove gauravj . There is a TestColumnSecurity test script but it only works on a real cluster and is not part of checkin/nightly tests. I will add it in the next patch for Pig-986. Thanks.

            People

            • Assignee:
              Yan Zhou
              Reporter:
              Yan Zhou
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development