|
> Is the part about specifying users/groups and allowing access only to them similar to
Thanks for pointing it out. They are meant to complement each other. Service-level authorization is done at the outset when a client connection is first set up. It only verifies if the client is a user of the cluster and nothing more. Once the connection is set up, e.g. in the case of RPC, this checking is bypassed and each subsequent RPC call is only subject to downstream authorization checking like > Having service-level authorization may save some resources by avoiding elaborate downstream checking.
Just to clarify, I meant resources may be saved in the cases of dealing with non-users. I don't mean replacing downstream checking with service-level authorization. Kan, is this understanding correct:
I think it would be a requirement that we use similar interfaces (say, to the adminstrator and for any APIs we propose) for these different, but clearly related, types of authorization. Does this make sense ? Per
<property> <name>mapred.queue.default.acl-administer-jobs</name> <value>alice,bob group1,group2</value> </property> '*' is used to specify 'all allowed'. Blank is used to specify none allowed. I meant these kind of interfaces - to the administrator, and think they should be consistent between the ACLs feature and what we choose to do here in this JIRA. We have two grids. We only want data to flow one way. If a user has an account on both grids, we would need to put IP-level protections in place in order to enforce the data flow policy (since only user-level protection would let the data flow both ways). The question comes down to whether Hadoop should do this or should there be a firewall rule in place.
In my mind, this is better handled by Hadoop, since there is a very good chance that you may only want certain users to have these restrictions. For example, normal users have to abide by the policy, but an automated task that copies data could be free to go any direction. Additionally, in a homogeneous set of nodes, it may be that the hadoop-site.xml file is the same for everything but the namenode configuration option such that it can be easily templated by a configuration management system. I'm highly concerned that putting something like ACLs in the hadoop-site.xml file. It would greatly increase the amount of work an ops team of many grids would have to do. If there is a GUI put on top of ACLs (as mentioned by the scheduling team), then I definitely do not want it mucking with hadoop-site.xml. Corruption of that file would likely mean my entire grid goes down vs. users just losing access. Here is a proposal for adding authorization to Hadoop:
Thoughts? This sounds like a reasonable design to me. We might call it "ServiceAuthorizationManager" or somesuch, since it's scope is limited to permitting or denying access to an entire service, rather than more fine-grained method or data-based authorization (e.g., permission to kill jobs launched by folks in the same group).
Preliminary patch for review while I continue testing. Things to do:
1. Test cases 2. Command line utilities for reloading the auth config etc. Looking at the implementation I'm reminded of HADOOP-4049.
Hadoop's RPC is implemented as two layers: Server & Client implement a simple transport that sends a stream of serialized instances to a server, where they're processed and then serialized instances are streamed back to the client. RPC layers methods, parameters, etc. on top of this. The layering isn't perfect, but it's still worth preserving. If we wish to replace the transport or the RPC logic someday, then keeping the layers distinct should simplify things. When a change requires changes to both layers, as this patch does, that raises a red flag, and makes me wonder if it might better be done at one level or the other, rather than spread across both. HADOOP-4049 started out modifying both layers, but eventually wound up only modifying the RPC layer, and it became a simpler patch for it. So I wonder if this might also be implemented by adding fields to Invocation, so that each call passes the protocol name and the invoking ugi. Then the Invoker can check these. Wouldn't that be simpler & contain the implementation to a single layer? Doug do you mean the authorization will be checked on each method invokation?
The problem is that the service authorization should be checked at the ipc layer rather then the rpc layer. > The problem is that the service authorization should be checked at the ipc layer rather then the rpc layer.
Why? I think authorization should be per-request, not per connection. That will permit us transport flexibility. We can and should multiplex requests and responses over shared pipes. It would be a feature if each request is self-contained, not relying on connection state. Our RPC system should be atop a message-based API, not a connection-based API. Connection management is an implementation detail of that layer. Authorization is protocol-specific and eventually method and parameter specific and thus in the domain of RPC. The transport should know nothing of protocols, methods and parameters. The transport should serialize requests and deliver them to a request processor. Strictly, rpc does not imply any notion of a connection. One could have a connection per rpc method.
The sharing of connection is merely an optimization. I think the question you have raised is deeper than at which layer to implement service authorization. We already have method level authorization in HDFS - each method implements its own authorization against the file object being accessed. So Doug, I see your argument to be equivalent to arguing against service level authorization and that method level authorization is sufficient. Does the above help or further confuses the matter? Sanjay> The best way to represent that service access is when a service proxy object is created - e.g when the connection is established.
A proxy is not bound to a single connection. Connections are retrieved from a cache each time a call is made. Different proxies may share the same connection, and a single proxy my use different connections for different calls. Sanjay> We could share multiple service sessions in a single connection but that complexity is not worth it. It would be simpler to implement this way, not more complex. In HADOOP-4049 it was considerably simpler to pass extra data by modifying the RPC code than Client/Server. That's my primary motivation here: to keep the code simple. So unless there's a reason why we must authorize per connection rather than per request, it would be easier to authorize requests and would better compartmentalize the code. There are some performance implications. Authorizing per request will use fewer connections but perform more authorizations. I don't know whether this is significant. I expect that ACLs will be cached, and that authorization will not be too expensive, but that remains to be seen. So performance may provide a motivation to authorize per connection. But let's not prematurely optimize. Sanjay> I see your argument to be equivalent to arguing against service level authorization and that method level authorization is sufficient. No, but we will eventually probably need method-level authorization too, and it would be nice if whatever support we add now also helps then. If we do this in RPC, then we can examine only the protocol name for now, and subsequently add method-level authorization at the same place. So implementing service-level-authentication this way better prepares us for method-level authentication. Sanjay> Would you be happier if we created an intermediate layer, say rpc-session, in between. I am not seriously suggesting we do that. We have two layers today. We could add this at either layer. It would be cleaner to add it only at one layer, not mixed between the two, as in the current patch. It would be simpler to add it to the RPC layer, and I have yet to hear a strong reason why that would be wrong. That's all I'm saying. > It would be simpler to add it to the RPC layer, and I have yet to hear a strong reason why that would be wrong.
There is nothing wrong about it, but to have effective authorization we can't rely on ugi's sent by the clients because clients may lie. They may authenticate as one user at the beginning of a connection and subsequently use a different ugi when making calls. So at the server side, you either authenticate the user for each call (which may not always be possible since many authentication protocols require more than one message passing to authenticate a user) or authenticate the user at the beginning of a connection and associate an authenticated user ID with the connection, such that all subsequent authorization checking on calls will use that authenticated user ID. The latter implies one user per connection. Sorry for coming late, but I am not in favor of implementing a custom authorization solution. I have been looking into JAAS lately and it seems a perfect fit for our needs. I guess incremental updates to JAAS based authorization and authentication can solve
this issue, For authorization we can plug current Unix based implementation for now, but once implemented And for access control, we can implement permission implementations for any type of operation. For example we can have ConnectionPermission to check to IPC connections, and RPCProtocolPermission for RPC protocol name permissions. With JAAS we can implement user based access control easily by running the code with the user's privileges, we can check for user's not killing each others job, etc. We can achieve authentication/authorization independency. We can just continue to work on access control with unixugi, and introduce kerberos based authentication w/o changing any authentication code at all. The blocker from my point of view is that if we put it just in the RPC layer, it is impossible to use any secure transport. In particular, I don't want to block the capability of have kerberos encrypted rpc channels. This is very important. Furthermore, it would be much much better if the kerberos authentication just had to happen once per a connection rather than once per a rpc.
Doug says:
Sanjay> The best way to represent that service access is when a service proxy object is created - e.g when the connection is established. >A proxy is not bound to a single connection. Connections are retrieved from a cache each time a call is made. Different proxies may share the same connection, and a single proxy my use different connections for different calls. Good point, I missed that. Sanjay> I see your argument to be equivalent to arguing against service level authorization and that method level authorization is sufficient. We already have method level authorization in HDFS (ie permissions checking). Doing it inside the rpc method invocation would not work since it is very specific to the method in question - one has to check the permissions along the path. Thus in general method level authorization is best done inside the implementation of each of the methods and not the rpc layer. I completely agree with you that the session layer (if we choose to create one) should be done in one of the two layer (ipc or rpc) but not both. Service-level authorization can be done in one of 3 places:
But if this Jira is indeed service-level semantics rather than method-level semantics we should strongly consider creating the session layer since authentication will also need the session layer; doing authentication on a per-rpc is very very expensive - not incorrect but expensive. Now here is where the clean abstractions get messy: Attached is a patch to illustrate my point. It is in very early state.
The patch introduces UserPrincipal and GroupPrincipal defining user name and groups. There is a solid JAASUUGI implementation that can be used in hdfs and mapred. The patch does not work for now, but I guess it may raise the discussion. Thoughts ? Owen> if we put it just in the RPC layer, it is impossible to use any secure transport
That's an argument about authentication, not authorization, no? Also, does Kerberos really require a separate TCP connection per client, or rather does it require separate handshakes and encryptions per client? We could establish a separate session per client over a shared connection... A useful transport abstraction is reliable binary message exchange. UDP is unreliable and limits the message size, and TCP is stream-based. We wish to exchange binary messages with a server as efficiently as possible. It will be simpler to focus on this problem if we can keep it separate from others. Handshakes, encryption, sessions, etc, can all be layered on top of this abstraction, just as SSL and HTTP are layered on TCP. Perhaps this abstraction will prove impractical... We need to agree on what features our transport layer provides. Do we want the lowest level to provide authenticated, authorized message exchange, or simply message channels, with authentication and authorization layered above. Or do we want to build on SSL-like standards (authenticated byte streams) and layer per-user message exchange on that? Even in the last case, authorization can be layered above message exchange, and I think the lower we can push message exchange the simpler things will be. We don't want application authorization logic mixed in with async i/o concerns if we can possibly avoid it. Sanjay, I'm fine adding a session layer someday. That would probably be useful.
Much of what you speak of is authentication, which may have to be session based rather than per-message. But, so far as I can see, service-level authorization belongs somewhere between our two existing layers, so we need to choose one, rather arbitrarily. My initial observation was merely, with the two layers we have now, it would be more simply implemented at the RPC layer. Looking at the patch, i think would take fewer lines of easier-to-read code to implement this feature there. (Compare earlier with later versions of HADOOP-4049.) Authorization, as specified in this issue, requires just a lookup of the username and group in a hashtable, so performance should not be an issue. So if we want to add service-level authorization now, without first re-architecting things, and we don't see a strong performance or functionality reason against checking service-level authorization per-request, then we should put it at the RPC layer. If we later re-architect things into more layers, as we probably should, then service-level authorization will end up in some new, intermediate layer. My initial comments were a response to the patch, not an architectural treaty. Let's keep things grounded in code. If someone wants to re-architect RPC in this patch, that's fine. But until then, we need to decide how to fit the desired feature into the existing code while making the least mess. If someone can contribute a patch which implements this feature as cleanly and simply at the IPC level alone as I think it can be done at the RPC level that'd be great. > Also, does Kerberos really require a separate TCP connection per client, or rather does it require separate handshakes and encryptions per client? We could establish a separate session per client over a shared connection...
Yes. we could. Kerberos itself has no concept of transport. However, it does require separate handshakes and encryptions per client. > Or do we want to build on SSL-like standards (authenticated byte streams) and layer per-user message exchange on that? This has the nice property of hiding authentication and encryption services from the upper layer. To the rpc layer it's the same as using a normal socket connection. > To the rpc layer it's the same as using a normal socket connection.
But SSL may not be useful to us. Does Java's SSLServerSocket implement non-blocking i/o? If it doesn't, then we may gain little by using SSL, since we'd have to re-implement it. It might be easier to encrypt and decrypt individual messages, implement handshakes, etc. on top of a binary message exchange protocol than to intermix that logic with the non-blocking client and server message exchange implementation. Eris,
I think your point is:
When we did permissions/UGI we did look at JAAS briefly but due to time pressures we went with our own impl. Our feeling was to reexamine this later. I think the rest of the debate in this jira about which layer (ipc or rpc) to use still stands. If we use Jaas then
I believe JMX used JAAS and it does not turn on the security manager and further puts the ACL in a separate file ( I will verify if JMX used JAAS). > It might be easier to encrypt and decrypt individual messages, implement handshakes, etc. on top of a binary message exchange protocol than to intermix that logic with the non-blocking client and server message exchange implementation.
The implementation should still be layered. You can think of it as a wrapper object that internally wraps Kerberos layer on top of a normal non-blocking socket and externally exposes a non-blocking socket interface. This will make Kerberos transparent to whatever message exchange protocol you want to run on top of it, be it RPC or not. > The implementation should still be layered.
I agree. But if we have to implement an SSL-like layer ourselves, it might be simpler to implement it as a session layered on a binary message-exchange API than as a non-blocking socket. Non-blocking i/o is an awkward coding paradigm, since you have to assemble messages (including headers) incrementally and maintain control state explicitly. > rather then invent new authorization layers use JAAS
+1 on using JAAS It seems that only IPC/RPC traffics are discussed. How about other channels, like Servlets and plain socket connections? These channels should be considered in the design. The corresponding implementations could be done in separated issues. > But if we have to implement an SSL-like layer ourselves, it might be simpler to implement it as a session layered on a binary message-exchange API than as a non-blocking socket.
Yes, you could be right. But the socket interface is well understood and fits in existing code nicely. > Non-blocking i/o is an awkward coding paradigm, since you have to assemble messages (including headers) incrementally and maintain control state explicitly. Yes, we do that explicitly in Server.Connection. With encryption, we'll have another layer of headers and buffers to manage. If we can encapsulate the encryption layer into a socket, we don't have to clog the Server code further. Doug> My initial comments were a response to the patch, not an architectural treaty. Let's keep things grounded in code. If someone wants to re-architect RPC in this patch, that's fine. But until then, we need to decide how to fit the desired feature into the existing code while making the least mess. If someone can contribute a patch which implements this feature as cleanly and simply at the IPC level alone as I think it can be done at the RPC level that'd be great.
Sorry I misunderstood you earlier. I though you were opposed to putting the service level authorization in the ipc layer even if it is doable. @Sanjay/Doug - It's fairly easy to push it down a level from RPC to IPC by passing in the AuthorizationManager, so I agree. However I'm a tad reluctant since I believe it's fair-game to have a hook on connection-setup, rather than leak the AuthorizationManager into the IPC layer... shrug.
@Enis - Thanks for the JAAS patch, I spent a fair bit of time researching - my take: 1. The 'policy' file is extremely ungainly and very hard to use. It also is 'principal-centric' and we'd really like it to be resource-centric i.e. the 'protocol' in this case (as in: who can access JobSubmissionProtocol?). I cannot seem to find a way around using the policy file at all i.e. say plugin our own ACL list via a config file and get AccessController.checkPermission to use it. I do see java.security.Policy, but I don't see it having the abstractions we need? I'd love your take on this - maybe I'm missing other features of JAAS? However, I think the 2 points we need to keep in mind are the same: do not use the java policy file and turn off the SecurityManager. Thoughts? @Sanjay
Exactly. @Arun, @Sanjay I am attaching a patch for review which does the job, w/o enabling security manager. The actual security manager's implementation is to delegate all the work to AccessController. The patch uses accesscontroller directly so SM is not explicitly enabled. With this schema, security checks only occur as in the patch. AFAIK, all the core java code use SM, and since it is not enabled, performance does not suffer. Of course we should check this before patch is committed. Patch introduces a HaddopPolicy implementation, which reads its configuration from an XML file. The schema of the file is relatively simple, and extensible for further JAAS based authorization checks. Unfortunately, the permissions/principals model does not fit perfectly to be configured in the property based configuration files, so I guess we have to include a new format. Attaching the patch for review.
works, but lacks test cases / javadoc. Attaching updated patch as a candidate for submission. Differences from previous include improved javadoc, test cases.
Here are some points from the patch :
Reviews are more than welcome. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393307/jaas_service_v3.patch against trunk revision 711350. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 38 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3529/testReport/ This message is automatically generated. Here is a proposal to take this forward so as to use JAAS more extensively for authorization and authentication in Hadoop:
1. Using JAAS subject can make it much easier for kerberos based authentication. However UGI is heavily used in HDFS. I guess in the short term we should use both and gradually switch to subject. Instead of passing UGI in connection header, and generating subject from UGI, we may pass subject in header and generate UGI from it.
2-3. Agree 4. +1 on making UGI internal. 5. The core to authentication in JAAS is LoginModule, which is responsible for authentication implementations. We can have text file, configuration, database, or LDAP based login modules. Activemq has classes for several LoginModules : http://svn.apache.org/viewvc/activemq/trunk/activemq-jaas/src/main/java/org/apache/activemq/jaas/ I think user-group mapping can be done in for example LDAPLoginModule.
I propose we switch to passing Subject in the connection header in-lieu of UGI via HADOOP-4656. Also, we'll keep UGI as an internal detail so that we do not need to change HDFS for now... maybe at some point we should completely do away with it and change all of HDFS to use Subject too. Arun, will this feature have a design doc and test plan?
I'm working on the design doc and the test plan, I'll put it up shortly.
Design doc and test plan attached.
Preliminary patch for review while I add test-cases etc.
I have taken Enis's patch further along... highlights:
Here is my take on the patch :
minor issues :
Questions after reading the design doc:
Additional test cases that come to mind after reading the (thin) design doc:
All for now. Thanks for the review Enis, responses inline:
AuthorizationException is used to hide the stack trace from the user, a security feature.
The default is set in SecurityUtil, it's set in various places with the appropriate PolicyProvider. We need different PolicyProviders since there are non-public protocols which need to be accessed e.g. JobSubmissionProtocol.
I agree, however I propose we do that separately once we change all of NN to use User/Group principals - for now this will keep the change smaller.
This patch is just using the ACLs already in place for Queues introduced via I fix all the other minor nits (logging, default values) in the next patch. Thanks! Nigel - responses inline:
Illegal schema is handled by Configuration itself and the ACL parser handles incorrect user/groups.
Configuration will fail if it cannot 'load' the resource.
Hmm... it's assumed that the cluster configs are handled correctly by the admin.
I'll update the spec - the proposal is to add a '-refresh-auth-policy' switch to dfsadmin, and a similar one to a new command 'mradmin'. I'll update the test plan too. Update document incorporating Nigel's feedback. Thanks Nigel!
Updated patch with command-line utilities support for 'hup' once hadoop-policy.xml is changed by the admin...
I have a few questions regarding the patch :
Minor things :
Thanks for the review Enis.
There are implementations of IPC which do not need authorization (e.g. test cases), which is why pulled the check into RPC.Server. I made it a concrete no-op implementation in ipc.Server to ensure that it isn't a in-compatible change...
I'm uncomfortable with this approach because it exposes cluster admins to the actual Java protocols (many of which are package-private e.g. JobSubmissionProtocol) and as you've pointed out it leads to more complicated handling (Configuration.getKeys followed by a search for keys starting with a specific prefix). I went the route of PolicyProvider to avoid enshrining code (actual protocols) in config files and exposing admins to them. It got a bit more complicated (HDFSPolicyProvider and MapReducePolicyProvider) because some protocols aren't public (JobSubmissionProtocol, TaskUmbilicalProtocol etc.).
Good catch - I'll fix this! (and the logging of course!) Some comments on the patch:
Updated patch with test cases, passes all local tests on my machine.
Thanks for the review Nicholas! I've incorporated most of your comments in this patch. Wrt the refresh command, I'm not sure it makes sense to require super-user privs. To my mind, any operation performed on the file-system (quotas, upgrade, safemode etc.) require super-user privs; however, changing ACLs on services is more of an administrative job... also there really isn't a notion of super-user for the map-reduce cluster. Maybe we could keep it as is for now and change it in the future? Fixed a couple of minor findbugs warnings...
Correct patch.
All unit tests pass... [exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 28 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.
I'm not sure why test-patch.sh is complaining about Eclipse classpath - this patch certainly doesn't introduce any new jars. [exec] ======================================================================
[exec] ======================================================================
[exec] Running Eclipse classpath verification.
[exec] ======================================================================
[exec] ======================================================================
[exec]
[exec]
[exec]
[exec] FAILED. Some jars are not declared in the Eclipse project.
[exec] Declared jars: lib/commons-cli-2.0-SNAPSHOT.jar
[exec] lib/commons-codec-1.3.jar
[exec] lib/commons-httpclient-3.0.1.jar
[exec] lib/commons-logging-1.0.4.jar
[exec] lib/commons-logging-api-1.0.4.jar
[exec] lib/commons-net-1.4.1.jar
[exec] lib/hsqldb-1.8.0.10.jar
[exec] lib/jets3t-0.6.1.jar
[exec] lib/jetty-6.1.14.jar
[exec] lib/jetty-util-6.1.14.jar
[exec] lib/jsp-2.1/core-3.1.1.jar
[exec] lib/jsp-2.1/jsp-2.1.jar
[exec] lib/jsp-2.1/jsp-api-2.1.jar
[exec] lib/junit-3.8.1.jar
[exec] lib/kfs-0.2.2.jar
[exec] lib/log4j-1.2.15.jar
[exec] lib/oro-2.0.8.jar
[exec] lib/servlet-api-2.5-6.1.14.jar
[exec] lib/slf4j-api-1.4.3.jar
[exec] lib/slf4j-log4j12-1.4.3.jar
[exec] lib/xmlenc-0.52.jar
[exec] src/test/lib/ftplet-api-1.0.0-SNAPSHOT.jar
[exec] src/test/lib/ftpserver-core-1.0.0-SNAPSHOT.jar
[exec] src/test/lib/ftpserver-server-1.0.0-SNAPSHOT.jar
[exec] src/test/lib/mina-core-2.0.0-M2-20080407.124109-12.jar
[exec] Present jars: lib//commons-cli-2.0-SNAPSHOT.jar
[exec] lib//commons-codec-1.3.jar
[exec] lib//commons-httpclient-3.0.1.jar
[exec] lib//commons-logging-1.0.4.jar
[exec] lib//commons-logging-api-1.0.4.jar
[exec] lib//commons-net-1.4.1.jar
[exec] lib//hsqldb-1.8.0.10.jar
[exec] lib//jets3t-0.6.1.jar
[exec] lib//jetty-6.1.14.jar
[exec] lib//jetty-util-6.1.14.jar
[exec] lib//jsp-2.1/core-3.1.1.jar
[exec] lib//jsp-2.1/jsp-2.1.jar
[exec] lib//jsp-2.1/jsp-api-2.1.jar
[exec] lib//junit-3.8.1.jar
[exec] lib//kfs-0.2.2.jar
[exec] lib//log4j-1.2.15.jar
[exec] lib//oro-2.0.8.jar
[exec] lib//servlet-api-2.5-6.1.14.jar
[exec] lib//slf4j-api-1.4.3.jar
[exec] lib//slf4j-log4j12-1.4.3.jar
[exec] lib//xmlenc-0.52.jar
[exec] src/test/lib//ftplet-api-1.0.0-SNAPSHOT.jar
[exec] src/test/lib//ftpserver-core-1.0.0-SNAPSHOT.jar
[exec] src/test/lib//ftpserver-server-1.0.0-SNAPSHOT.jar
[exec] src/test/lib//mina-core-2.0.0-M2-20080407.124109-12.jar
We can run the tests disabling authorization, no?
Well, in order for the admin to configure per protocol, we must somehow expose the private protocols after all. From my point of view exposing "inter.datanode.protocol" or "org.apache.hadoop.server.protocol.InterDatanodeProtocol" does not differ. Having said that, I'm Ok with current way. Updated patch, on Sanjay's feedback I changed ConnectionHeader to not be a 'Configurable' class...
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12395517/HADOOP-4348_5_20081206.patch against trunk revision 724578. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3693/testReport/ This message is automatically generated. Updated patch with a few more test cases...
Went over your patch. Some feedback:
1) ConnectionHeader should have as javadoc since it is part of the protocol. 2) It is bizarre for ConnectionHeader to be configurable!! 3) The Phantom "authorize" call. 3b) You use call id of -1 for authorize call. 3c) Given that FATAL only occurs for specific speail Phantom calls, if (state == Status.FATAL.state) { if (id == AUTHORIZE_CONNECTION_CALLID) { // authorization failed } else If (id == PROTOCOL_CHECK_CALLID ) { // protocol does not match ( - see my Item-10 below) .. .other phantom calls } 3d) Somewhere in the connection establishment code or receiveResponse, Consider the following: 4) Coding Style: ipc.server.java 5) Following comment in code is incorrect. /// Reads the connection header following version and /// authorizes the connection private void processHeader() throws IOException { The authorization is not done in this processHeader(); it is done by the 6. ipc.server.authorize()
7. Add java doc for the params for the ipc.server.call() 8. You deprecated two call() methods in ipc.server. 9. Move your new method setupResponse to the Call class. 10. On server side check the class name (protocol) against any base class of
11. On the IPC/RPC server side you eliminated the need for UGI by using 11a) File jira remove UGI on rpc/ipc client side (ie use JASS instead)
11b) File jira for fixing the calls to UGI in our various servers (NN, DN, JT etc)
12. Add javadoc in securityUtil to say that it is for service level authorization. 13. service.java - what is the servicekey (is it serviceName?). 14. HDFSPolicyProvider and MRPolicyProvider. 14a) Add java doc which says
14b) I think we can do without these (you can do this in a new jira if you like)
15) Move the doAs in securityUtil.authorize() to the IPC/RPC layer In the Item-14b jira when we change the rpc.getServer() to add the rpc interfaces list, 16) SecurityUtil.getSubject()
17) You added a method to dfsAdmin called refreshAuthorizationPolicy().
18) General comment: make sure the that all the classes/interfaces you added to Updated patch to incorporate the last few bits of Sanjay's feedback... I'm opening separate jiras for some of his comments as indicated there.
Previous patch had a minor error, fixed.
Fixed javadocs for a couple of classes...
Integrated in Hadoop-trunk #685 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/685/
. Add service-level authorization for Hadoop. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HADOOP-3698, which was fixed for Hadoop 0.19 ?