Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      Add a thrift proxy server to make integration with other languages besides Java a bit easier. This should work like http://wiki.apache.org/hadoop/Hbase/ThriftApi.

      1. accumulo482patch.diff
        2.74 MB
        Chris McCubbin
      2. accumulo482patch-A.diff
        2.96 MB
        Chris McCubbin

        Issue Links

          Activity

          Hide
          jv added a comment -

          We've (the team) discussed this quite a bit offline. While one can simply write a translation API, there is something else we would like to accomplish with a proxy. With the column visibilities, we contend that any part of the Key/Value pair could be the piece that is worth restricting. And we attempt to uphold this to teh highest degree. The one place where this falters is in the !METADATA table. Whenever we split, we base the split point off of an existing row ID in the table. And if the reason a column/value pair is put in a particular visiblity is because of the row part of the key, then we're potentially leaking it. All clients must be able to read the !METADATA table in order to determine what tablet (and tserver) they need to hit to get their next piece of information.

          That said, we would actually like to make server side proxies. We've never really concluded on tserver thread vs. independent process, I think that's still up in the air. But regardless, what we should do is have all clients come in through this proxy service, and it will handle all of the !METADATA table lookups and funnel all of the data through itself back to the client. This can prevent a possible attack to guesstimate where the tablets split (Adam can talk more about this). We should have it utilize the current client API, as we still want to use it for some operations (MR InputForamat off the top of my head).

          However, aside from this, it should work as a proxy for all other purposes as this is intended. We can simplify the API for the client user, letting all of the !METADATA lookups occur on the proxy, we can do all of the error translation, etc. This means we could have a much, MUCH lighter client api that does easily trascend languages.

          As for the language, I don't really care. We use thrift, so it would allow for some object reuse (but then we will have to reimplement some objects, like Key, Value, and Mutation, in the various languages I think). But I wouldn't object to seeing something else used like Avro or any of those other similar projects. However, I don't really see the need for the flexibility for Avro, but I think it offers more than just that. Just my two cents.

          I'll get off my soap box now.

          Show
          jv added a comment - We've (the team) discussed this quite a bit offline. While one can simply write a translation API, there is something else we would like to accomplish with a proxy. With the column visibilities, we contend that any part of the Key/Value pair could be the piece that is worth restricting. And we attempt to uphold this to teh highest degree. The one place where this falters is in the !METADATA table. Whenever we split, we base the split point off of an existing row ID in the table. And if the reason a column/value pair is put in a particular visiblity is because of the row part of the key, then we're potentially leaking it. All clients must be able to read the !METADATA table in order to determine what tablet (and tserver) they need to hit to get their next piece of information. That said, we would actually like to make server side proxies. We've never really concluded on tserver thread vs. independent process, I think that's still up in the air. But regardless, what we should do is have all clients come in through this proxy service, and it will handle all of the !METADATA table lookups and funnel all of the data through itself back to the client. This can prevent a possible attack to guesstimate where the tablets split (Adam can talk more about this). We should have it utilize the current client API, as we still want to use it for some operations (MR InputForamat off the top of my head). However, aside from this, it should work as a proxy for all other purposes as this is intended. We can simplify the API for the client user, letting all of the !METADATA lookups occur on the proxy, we can do all of the error translation, etc. This means we could have a much, MUCH lighter client api that does easily trascend languages. As for the language, I don't really care. We use thrift, so it would allow for some object reuse (but then we will have to reimplement some objects, like Key, Value, and Mutation, in the various languages I think). But I wouldn't object to seeing something else used like Avro or any of those other similar projects. However, I don't really see the need for the flexibility for Avro, but I think it offers more than just that. Just my two cents. I'll get off my soap box now.
          Hide
          Aaron Cordova added a comment -

          It was a little hard to understand the problem the way you stated it, but I think I understand. You mean any part of the keyvalue pair might be sensitive and need to be protected and the !METADATA table might expose some of those.

          So you're saying if a proxy is created you don't want !METADATA entries to be sent to it, because the way most instances of Accumulo are deployed involves the Accumulo client being run on a protected machine, i.e. a machine on the security perimeter that the user can't just examine the memory of.

          But I don't think METADATA entries need to be exposed outside the trusted accumulo client in order for a proxy to work ... one could simply ferry scan parameters and the resulting keyvalue pairs to and from the client (for queries).

          The proxy could also be configured to simply disallow reading directly from the !METADATA table. Would that still leave the possibility of sensitive elements of KeyValue pairs being leaked?

          Show
          Aaron Cordova added a comment - It was a little hard to understand the problem the way you stated it, but I think I understand. You mean any part of the keyvalue pair might be sensitive and need to be protected and the !METADATA table might expose some of those. So you're saying if a proxy is created you don't want !METADATA entries to be sent to it, because the way most instances of Accumulo are deployed involves the Accumulo client being run on a protected machine, i.e. a machine on the security perimeter that the user can't just examine the memory of. But I don't think METADATA entries need to be exposed outside the trusted accumulo client in order for a proxy to work ... one could simply ferry scan parameters and the resulting keyvalue pairs to and from the client (for queries). The proxy could also be configured to simply disallow reading directly from the !METADATA table. Would that still leave the possibility of sensitive elements of KeyValue pairs being leaked?
          Hide
          Sapan Shah added a comment -

          A colleague Jim Zombek and I have started work on this, and we hope to have a FIRST version of a design document discussing authentication, creating a table, putting data into the table, and getting data from the table hopefully by the end of next week. I will post it to the ticket.
          I am mostly posting this so that people know work is being done on this ticket .

          Show
          Sapan Shah added a comment - A colleague Jim Zombek and I have started work on this, and we hope to have a FIRST version of a design document discussing authentication, creating a table, putting data into the table, and getting data from the table hopefully by the end of next week. I will post it to the ticket. I am mostly posting this so that people know work is being done on this ticket .
          Hide
          Jason Trost added a comment -

          Have there been any updates on this (Sapan/Jim)? Talking to devs from other start-ups makes me think this feature is more important than I originally thought. Not having a thrift proxy API or a built-in REST API is a real barrier to non-java/non-JVM devs using/adopting Accumulo. This barrier is not there for Cassandra and HBase since they both have Thrift APIs.

          I think many devs in our community discount this since Java is so dominant, but outside the government, esp. in smaller start-ups there is a real aversion to Java, instead favoring python, Ruby, Node JS, etc.

          Show
          Jason Trost added a comment - Have there been any updates on this (Sapan/Jim)? Talking to devs from other start-ups makes me think this feature is more important than I originally thought. Not having a thrift proxy API or a built-in REST API is a real barrier to non-java/non-JVM devs using/adopting Accumulo. This barrier is not there for Cassandra and HBase since they both have Thrift APIs. I think many devs in our community discount this since Java is so dominant, but outside the government, esp. in smaller start-ups there is a real aversion to Java, instead favoring python, Ruby, Node JS, etc.
          Hide
          Chris McCubbin added a comment -

          I can take a stab at this. We have been working through a lot of similar issues recently.

          Show
          Chris McCubbin added a comment - I can take a stab at this. We have been working through a lot of similar issues recently.
          Hide
          Chris McCubbin added a comment -

          I'm submitting a first cut at this. It uses thrift and the proxy pretty much mirrors the capabilities of the usual java connector, with a few caveats:

          -Some of the methods aren't implemented, for various reasons.
          -There's only one way to obtain a scanner, and one way to insert cells. Scanners still support iterators, ranges, etc.

          The thrift API file includes the accumulo data.thrift API and that file must be available as a library when compiling the thrift. The compiled versions of the classes are included in the patch, the same way that accumulo itself does.

          The server is a standalone server for now, see the README. Example clients (with compiled thrift classes) are included for Java, Python, and Ruby. For ruby and python I believe you will have to install thrift for them to work. The java client (included in the maven project) uses a maven thrift dependency so a thrift install is not needed. Another thing is that I had a few dependency conflict issues with the main project so the proxy pom file does not currently use the accumulo pom as a parent. The accumulo pom file still has proxy as a subproject (e.g. it will install proxy when it is installed). I'm working to fix this.

          Please test it out, let me know what you think. Thanks. --C

          Show
          Chris McCubbin added a comment - I'm submitting a first cut at this. It uses thrift and the proxy pretty much mirrors the capabilities of the usual java connector, with a few caveats: -Some of the methods aren't implemented, for various reasons. -There's only one way to obtain a scanner, and one way to insert cells. Scanners still support iterators, ranges, etc. The thrift API file includes the accumulo data.thrift API and that file must be available as a library when compiling the thrift. The compiled versions of the classes are included in the patch, the same way that accumulo itself does. The server is a standalone server for now, see the README. Example clients (with compiled thrift classes) are included for Java, Python, and Ruby. For ruby and python I believe you will have to install thrift for them to work. The java client (included in the maven project) uses a maven thrift dependency so a thrift install is not needed. Another thing is that I had a few dependency conflict issues with the main project so the proxy pom file does not currently use the accumulo pom as a parent. The accumulo pom file still has proxy as a subproject (e.g. it will install proxy when it is installed). I'm working to fix this. Please test it out, let me know what you think. Thanks. --C
          Hide
          Chris McCubbin added a comment -

          Oh and by the way, thanks to Phil E. for the Ruby part and Adam F. for code reviews and suggestions.

          Show
          Chris McCubbin added a comment - Oh and by the way, thanks to Phil E. for the Ruby part and Adam F. for code reviews and suggestions.
          Hide
          Keith Turner added a comment -

          I threw the patch on review board because I wanted to make some comments

          Show
          Keith Turner added a comment - I threw the patch on review board because I wanted to make some comments
          Hide
          MT added a comment -

          Hi all, I'm currently looking at accessing Accumulo from a Python Twisted application (http://twistedmatrix.com/trac/). To get myself started I've created a Twisted Client that essentially re-implements the Java client, and talks directly to the TabletClientService using the Thrift API. I'm compiling the source from trunk and am relying on the thrift files under /core/src/main/thrift/*.thrift and ./trace/src/main/thrift/cloudtrace.thrift. I have a basic use case up and running where I'm able to connect to the TableClientServer and pull data using startScan/closeScan. I've made it through a fair amount of the Accumulo source code to get to this point, and hit few stumbling blocks along the way (had to figure out I need to use TCompactProtocol the hard way, and I was initially running into PermissionDenied errors as I was passing the actual table name instead of the table id in the TKeyExtent), but everything is looking good now.

          I'm guessing that re-implementing the java client in python is less then ideal from a maintainability stand point, so I'll take a look at Keith's code at some point in the near future and figure out how make it play nicely with Twisted so I can switch over to using it. So really this is just a heads up to fellow Twisted users that's there's someone out there who is looking to add Twisted support to Accumulo.

          Show
          MT added a comment - Hi all, I'm currently looking at accessing Accumulo from a Python Twisted application ( http://twistedmatrix.com/trac/ ). To get myself started I've created a Twisted Client that essentially re-implements the Java client, and talks directly to the TabletClientService using the Thrift API. I'm compiling the source from trunk and am relying on the thrift files under /core/src/main/thrift/*.thrift and ./trace/src/main/thrift/cloudtrace.thrift. I have a basic use case up and running where I'm able to connect to the TableClientServer and pull data using startScan/closeScan. I've made it through a fair amount of the Accumulo source code to get to this point, and hit few stumbling blocks along the way (had to figure out I need to use TCompactProtocol the hard way, and I was initially running into PermissionDenied errors as I was passing the actual table name instead of the table id in the TKeyExtent), but everything is looking good now. I'm guessing that re-implementing the java client in python is less then ideal from a maintainability stand point, so I'll take a look at Keith's code at some point in the near future and figure out how make it play nicely with Twisted so I can switch over to using it. So really this is just a heads up to fellow Twisted users that's there's someone out there who is looking to add Twisted support to Accumulo.
          Hide
          MT added a comment -

          Err, make that Chris's code, sorry about that. Also another reason I posted was to hopefully get some feedback on how bad talking directly to the TabletClientServer is from an Accumulo dev point of view.

          Show
          MT added a comment - Err, make that Chris's code, sorry about that. Also another reason I posted was to hopefully get some feedback on how bad talking directly to the TabletClientServer is from an Accumulo dev point of view.
          Hide
          Eric Newton added a comment -

          I'm a big twisted fan... it would be nice to see Accumulo with an easy-to-use asynch client library.

          Show
          Eric Newton added a comment - I'm a big twisted fan... it would be nice to see Accumulo with an easy-to-use asynch client library.
          Hide
          Jason Trost added a comment -

          Some comments.

          1. First off, I think this is an excellent start.

          2. If possible, I would recommend separating the thrift generated java code into a "proxy/src/main/gen-java/" to ensure that this code is not hand modified. I didn't notice any directories with similar names anywhere in the accumulo codebase so this may not be a convention with this project, but we've found that separating this code makes test coverage reporting easier.

          3. I didn't see a method like this, but I think you should add a close_scanner(String cookie) or scanner_close(String cookie) method so the scanner can be explicitly closed when not being used. You may also want to add some expiration mechanism as well so bad client code does keep adding scanners and iterators to their respective Maps. Google Guava has a Cache class that makes this really easy.

          4. in your scanner_next_k() function, I would recommend calling ret.setResults(new ArrayList<TKeyValue>()) right after "ret" is initialized. This makes the client code not have to check whether results are null before attempting to iterate through them. This makes the client code easier to write, IMO.

          more comments to come as I play with this more... Thanks.

          Show
          Jason Trost added a comment - Some comments. 1. First off, I think this is an excellent start. 2. If possible, I would recommend separating the thrift generated java code into a "proxy/src/main/gen-java/" to ensure that this code is not hand modified. I didn't notice any directories with similar names anywhere in the accumulo codebase so this may not be a convention with this project, but we've found that separating this code makes test coverage reporting easier. 3. I didn't see a method like this, but I think you should add a close_scanner(String cookie) or scanner_close(String cookie) method so the scanner can be explicitly closed when not being used. You may also want to add some expiration mechanism as well so bad client code does keep adding scanners and iterators to their respective Maps. Google Guava has a Cache class that makes this really easy. 4. in your scanner_next_k() function, I would recommend calling ret.setResults(new ArrayList<TKeyValue>()) right after "ret" is initialized. This makes the client code not have to check whether results are null before attempting to iterate through them. This makes the client code easier to write, IMO. more comments to come as I play with this more... Thanks.
          Hide
          Keith Turner added a comment -

          As I have been researching ways to make Accumulo accessible to other languages, I ran across JCC[1] used by PyLucene[2] and I thought it was interesting. Anyone have experience with this? Would it be an option for exposing Accumulo API to Python?

          [1] : http://lucene.apache.org/pylucene/jcc/index.html
          [2] : http://lucene.apache.org/pylucene/

          Show
          Keith Turner added a comment - As I have been researching ways to make Accumulo accessible to other languages, I ran across JCC [1] used by PyLucene [2] and I thought it was interesting. Anyone have experience with this? Would it be an option for exposing Accumulo API to Python? [1] : http://lucene.apache.org/pylucene/jcc/index.html [2] : http://lucene.apache.org/pylucene/
          Hide
          Eric Newton added a comment -
          1. It's surprising that you are using the internal thrift types. They also have the awkward names "TKeyValue" instead of "KeyValue". I would go ahead and create separate thrift definitions.
          2. I think it's important to maintain the mutation type for ingest. TKeyValue doesn't have an overt way to represent delete, for example.
          3. it might be an interesting exercise to port the Shell over the thrift API; it helped me find the corners of the API when I wrote MockAccumulo
          Show
          Eric Newton added a comment - It's surprising that you are using the internal thrift types. They also have the awkward names "TKeyValue" instead of "KeyValue". I would go ahead and create separate thrift definitions. I think it's important to maintain the mutation type for ingest. TKeyValue doesn't have an overt way to represent delete, for example. it might be an interesting exercise to port the Shell over the thrift API; it helped me find the corners of the API when I wrote MockAccumulo
          Hide
          Keith Turner added a comment -

          I would go ahead and create separate thrift definitions.

          I agree. Want the proxy API to be as stable as possible, so make it user facing and independent of internal thrift APIs.

          Show
          Keith Turner added a comment - I would go ahead and create separate thrift definitions. I agree. Want the proxy API to be as stable as possible, so make it user facing and independent of internal thrift APIs.
          Hide
          Chris McCubbin added a comment -

          I'll change the proxy thrift to have it's own data types. However, I may keep a convention that separates the naming from the associated Accumulo java types. If not, the code in the server becomes nasty looking as you have to fully qualify one of the types with the same simple names.

          Show
          Chris McCubbin added a comment - I'll change the proxy thrift to have it's own data types. However, I may keep a convention that separates the naming from the associated Accumulo java types. If not, the code in the server becomes nasty looking as you have to fully qualify one of the types with the same simple names.
          Hide
          Eric Newton added a comment -

          I feel your pain, but you only have to do it in the server side once. You can use Eclipse as a crutch to make it easier to type (though, it's still ugly to look at). You could get creative and create trivial sub-classes of the Accumulo client code.

          Show
          Eric Newton added a comment - I feel your pain, but you only have to do it in the server side once. You can use Eclipse as a crutch to make it easier to type (though, it's still ugly to look at). You could get creative and create trivial sub-classes of the Accumulo client code.
          Hide
          Chris McCubbin added a comment -

          I just uploaded a patch that addresses many of the issues raised here and on the review board. Major changes include:

          -Scanners can be closed and the map they are stored in is now a Guava cache set for LRU kickout and time-based kickout.
          -Writers can be one-shot as before, or you can create one and re-use it.
          -The data classes are now proxy-specific and the thrift does not depend on accumulo's internal thrift. Most of the data classes are named P* when there is a identically named accumulo class (PKey, etc). I know that's not so attractive but it makes the code much less error-prone when converting between accumulo Keys and PKeys.
          -PMutations have been added for writing. Adam and I discussed the fact that column updates can be confusing to create directly so this class has a set of key-values to add and a set to delete. An exception is thrown if the key's row doesn't match the mutation's row.
          -Bugfixes and additional unit tests

          Please review and let me know if you have more suggestions. Thanks.

          Show
          Chris McCubbin added a comment - I just uploaded a patch that addresses many of the issues raised here and on the review board. Major changes include: -Scanners can be closed and the map they are stored in is now a Guava cache set for LRU kickout and time-based kickout. -Writers can be one-shot as before, or you can create one and re-use it. -The data classes are now proxy-specific and the thrift does not depend on accumulo's internal thrift. Most of the data classes are named P* when there is a identically named accumulo class (PKey, etc). I know that's not so attractive but it makes the code much less error-prone when converting between accumulo Keys and PKeys. -PMutations have been added for writing. Adam and I discussed the fact that column updates can be confusing to create directly so this class has a set of key-values to add and a set to delete. An exception is thrown if the key's row doesn't match the mutation's row. -Bugfixes and additional unit tests Please review and let me know if you have more suggestions. Thanks.
          Hide
          Chris McCubbin added a comment -

          Here's a review board version if you prefer that https://reviews.apache.org/r/8039/

          Show
          Chris McCubbin added a comment - Here's a review board version if you prefer that https://reviews.apache.org/r/8039/
          Hide
          Eric Newton added a comment -

          I think the continued use of PKeyValue obfuscates what is going on here:

          struct PMutation {
              1:binary row
              2:list<PKeyValue> cells;
              3:list<PKeyValue> deletedCells;
          }
          

          Seems that I would expect

          struct PMutation {
            1: binary row
            2: map<PColumnUpdate, binary> cells;
            3: set<PColumnUpdate> deleteCels;
          }
          

          Why would a PColumnUpdate would be confusing to create directly? It would be confusing to me to get a runtime error because I specified the row as something different in the PKeyValue.

          Show
          Eric Newton added a comment - I think the continued use of PKeyValue obfuscates what is going on here: struct PMutation { 1:binary row 2:list<PKeyValue> cells; 3:list<PKeyValue> deletedCells; } Seems that I would expect struct PMutation { 1: binary row 2: map<PColumnUpdate, binary> cells; 3: set<PColumnUpdate> deleteCels; } Why would a PColumnUpdate would be confusing to create directly? It would be confusing to me to get a runtime error because I specified the row as something different in the PKeyValue.
          Hide
          Adam Fuchs added a comment -

          We're trying to simultaneously optimize for a number of factors in this API. The three that matter for picking which write methods to support are:

          1. Make it as easy to use as possible
            1. Keep the objects intuitive
            2. Expose all the functionality via a minimal set of methods
            3. Also use a minimal set of data objects
          2. Make it fast
            1. Support batching of writes
            2. Minimize data transmission via natural hierarchical encoding, etc.
            3. Drive users towards the most efficient use by avoiding adding inefficient methods
          3. Make it maintainable
            1. Use a minimal set of methods and objects

          On the one hand, introducing a PColumnUpdate object violates 1.3, but using a PMutation that could throw an exception because the PKeyValue objects are not in the same row violates 1.1. We could support both the PMutation with PColumnUpdate objects, and also support a write method that takes a PKeyValue or a list of PKeyValue. That would seem to get around 1.3 by not requiring that users learn part of the API, but it doesn't really help with 1.2 or 3.1.

          An additional alternative would be to support writes via only the following method:

          oneway void write(1:binary writerToken, 2:list<PKeyValue> inserts, 3:list<PKey> deletes);
          

          The semantics of this write function would be such that all inserts and deletes within the same row will be applied via the same mutation. Keys across multiple rows will be applied in multiple mutations (one per row). While this does nothing for 2.2, I think this might be the best we can do to optimize across all of the factors listed above.

          Show
          Adam Fuchs added a comment - We're trying to simultaneously optimize for a number of factors in this API. The three that matter for picking which write methods to support are: Make it as easy to use as possible Keep the objects intuitive Expose all the functionality via a minimal set of methods Also use a minimal set of data objects Make it fast Support batching of writes Minimize data transmission via natural hierarchical encoding, etc. Drive users towards the most efficient use by avoiding adding inefficient methods Make it maintainable Use a minimal set of methods and objects On the one hand, introducing a PColumnUpdate object violates 1.3, but using a PMutation that could throw an exception because the PKeyValue objects are not in the same row violates 1.1. We could support both the PMutation with PColumnUpdate objects, and also support a write method that takes a PKeyValue or a list of PKeyValue. That would seem to get around 1.3 by not requiring that users learn part of the API, but it doesn't really help with 1.2 or 3.1. An additional alternative would be to support writes via only the following method: oneway void write(1:binary writerToken, 2:list<PKeyValue> inserts, 3:list<PKey> deletes); The semantics of this write function would be such that all inserts and deletes within the same row will be applied via the same mutation. Keys across multiple rows will be applied in multiple mutations (one per row). While this does nothing for 2.2, I think this might be the best we can do to optimize across all of the factors listed above.
          Hide
          Keith Turner added a comment - - edited

          On the one hand, introducing a PColumnUpdate object violates 1.3

          If the set of data objects is too small, it will make the API harder to use. How does introducing PColumnUpdate make it harder to use? Can you give an example?

          Show
          Keith Turner added a comment - - edited On the one hand, introducing a PColumnUpdate object violates 1.3 If the set of data objects is too small, it will make the API harder to use. How does introducing PColumnUpdate make it harder to use? Can you give an example?
          Hide
          Adam Fuchs added a comment -

          Having multiple structures that can represent the same data means that the user has to understand a more complex type system. The idea of using a minimal set (versus minimum) is that if you have two structs, either of which could be used just as effectively (as defined by other constraints) in the same methods, then you have too many structs.

          In this particular case, if users never have to understand the concept of a PColumnUpdate and how it differs from a PKeyValue, then we have a simpler and easier to use API (assuming the cost of making that change to other factors is small).

          Show
          Adam Fuchs added a comment - Having multiple structures that can represent the same data means that the user has to understand a more complex type system. The idea of using a minimal set (versus minimum) is that if you have two structs, either of which could be used just as effectively (as defined by other constraints) in the same methods, then you have too many structs. In this particular case, if users never have to understand the concept of a PColumnUpdate and how it differs from a PKeyValue, then we have a simpler and easier to use API (assuming the cost of making that change to other factors is small).
          Hide
          Keith Turner added a comment -

          Make it maintainable

          I am thinking the maintenance headaches in the future will come from the many table, instance, and security operation methods exposed via the thrift API and less so from the the read and write methods. But maybe not.

          Does the thrift API only provide a way to create a batch scanner? No way to create a regular scanner?

          Show
          Keith Turner added a comment - Make it maintainable I am thinking the maintenance headaches in the future will come from the many table, instance, and security operation methods exposed via the thrift API and less so from the the read and write methods. But maybe not. Does the thrift API only provide a way to create a batch scanner? No way to create a regular scanner?
          Hide
          Chris McCubbin added a comment -

          Does the thrift API only provide a way to create a batch scanner? No way to create a regular scanner?

          There's currently only a way to create a batch scanner.

          I am thinking the maintenance headaches in the future will come from the many table, instance, and security operation methods exposed via the thrift API ...

          I can easily remove methods from the API if we think they are not needed for this interface. Let's discuss if any of the ones I implemented won't be needed on a regular basis.

          Show
          Chris McCubbin added a comment - Does the thrift API only provide a way to create a batch scanner? No way to create a regular scanner? There's currently only a way to create a batch scanner. I am thinking the maintenance headaches in the future will come from the many table, instance, and security operation methods exposed via the thrift API ... I can easily remove methods from the API if we think they are not needed for this interface. Let's discuss if any of the ones I implemented won't be needed on a regular basis.
          Hide
          Keith Turner added a comment -

          More considerations

          • How do we let users specify that they want to let system set timestamp?
          • make it easy to write what you scan... writing PKey supports this
          • make it easy to make column updates to a row atomic... writing list<PKeyValue> really works against this
          • how easy is it to write code against the API and how does code that uses the API look ... I think this is really important... being forced to repeat the same row for a mutation works against this

          Thinking about this, we could really minimize structure overlap by making PKeyValue that was composed of PKey, Value. And then PKey composed of a row and PColumn. I am thinking that trying to minimize the structs like this makes it more cumbersome to write the code.

            updates = [new PColumnValue(new PColumn("cf","cq1","cv",7),"val1"), new PColumnValue(new PColumn("cf","cq2","cv",7),"val2")]
            deletes = [new PColumn("cf","cq3","cv",8)]
            Mutation m = new Mutation("row", updates, deletes);
          

          If we had PColumnValue that was flat, I think the following code is easier to write than the code above. But this makes it more cumbersome to write code that scans data and writes it to another table.

            updates = [new PColumnValue("cf","cq1","cv",7,"val1"), new PColumnValue("cf","cq2","cv",7,"val2")]
            deletes = [new PColumn("cf","cq3","cv",8)]
            Mutation m = new Mutation("row", updates, deletes);
          

          I do not think there will be any confusion when a user looks at these flat data structs. I suppose I think users will start with the functions and not the data structs. At least thats what I would do. For example I would look at the write method and see that it takes a mutation, then I would look at mutation to see what it needs.

          Show
          Keith Turner added a comment - More considerations How do we let users specify that they want to let system set timestamp? make it easy to write what you scan... writing PKey supports this make it easy to make column updates to a row atomic... writing list<PKeyValue> really works against this how easy is it to write code against the API and how does code that uses the API look ... I think this is really important... being forced to repeat the same row for a mutation works against this Thinking about this, we could really minimize structure overlap by making PKeyValue that was composed of PKey, Value. And then PKey composed of a row and PColumn. I am thinking that trying to minimize the structs like this makes it more cumbersome to write the code. updates = [new PColumnValue(new PColumn("cf","cq1","cv",7),"val1"), new PColumnValue(new PColumn("cf","cq2","cv",7),"val2")] deletes = [new PColumn("cf","cq3","cv",8)] Mutation m = new Mutation("row", updates, deletes); If we had PColumnValue that was flat, I think the following code is easier to write than the code above. But this makes it more cumbersome to write code that scans data and writes it to another table. updates = [new PColumnValue("cf","cq1","cv",7,"val1"), new PColumnValue("cf","cq2","cv",7,"val2")] deletes = [new PColumn("cf","cq3","cv",8)] Mutation m = new Mutation("row", updates, deletes); I do not think there will be any confusion when a user looks at these flat data structs. I suppose I think users will start with the functions and not the data structs. At least thats what I would do. For example I would look at the write method and see that it takes a mutation, then I would look at mutation to see what it needs.
          Hide
          Keith Turner added a comment -

          There's currently only a way to create a batch scanner.

          Seems like we would need the ability to create a scanner. The batch scanner does not return result in sorted order, the scanner does. The user may want results in sorted order.

          Show
          Keith Turner added a comment - There's currently only a way to create a batch scanner. Seems like we would need the ability to create a scanner. The batch scanner does not return result in sorted order, the scanner does. The user may want results in sorted order.
          Hide
          Chris McCubbin added a comment -

          How do we let users specify that they want to let system set timestamp?

          I just modified PKey to make timestamp an optional value. If it is unset then the system will set the timestamp. I believe this is the best way to specify this so we don't have any "magic timestamp values" (0, -1, etc) that users may actually want to use for some reason.

          Show
          Chris McCubbin added a comment - How do we let users specify that they want to let system set timestamp? I just modified PKey to make timestamp an optional value. If it is unset then the system will set the timestamp. I believe this is the best way to specify this so we don't have any "magic timestamp values" (0, -1, etc) that users may actually want to use for some reason.
          Hide
          Chris McCubbin added a comment -

          Seems like we would need the ability to create a scanner. The batch scanner does not return result in sorted order, the scanner does. The user may want results in sorted order.

          I'll add that capability after we work out the data structures for reading and writing. The API should be similar to the batch scanner methods.

          Show
          Chris McCubbin added a comment - Seems like we would need the ability to create a scanner. The batch scanner does not return result in sorted order, the scanner does. The user may want results in sorted order. I'll add that capability after we work out the data structures for reading and writing. The API should be similar to the batch scanner methods.
          Hide
          Keith Turner added a comment -

          I just modified PKey to make timestamp an optional value

          Interesting, not a thrift feature I am familiar with. In a few of the popular languages, is the generated code for dealing with an optional field easy to write? Does this complicate reading data in anyway? When reading data, we would always expect the timestamp to be set, so it would be odd to have to constantly check if its set when reading.

          Show
          Keith Turner added a comment - I just modified PKey to make timestamp an optional value Interesting, not a thrift feature I am familiar with. In a few of the popular languages, is the generated code for dealing with an optional field easy to write? Does this complicate reading data in anyway? When reading data, we would always expect the timestamp to be set, so it would be odd to have to constantly check if its set when reading.
          Hide
          Chris McCubbin added a comment -

          In a few of the popular languages, is the generated code for dealing with an optional field easy to write?

          Not exactly sure about all languages. In Java optional fields aren't in the constructor and need to be set via a method call.

          Does this complicate reading data in anyway? When reading data, we would always expect the timestamp to be set, so it would be odd to have to constantly check if its set when reading.

          Well, I copy whatever is in the key coming out of accumulo into the response value. That value cannot be null since it is a primitive. So on a scan the timestamp will always be set. (This happens in Util.toThrift(Key) ). In Java the timestamp is still a primitive so it won't throw an execption if you try to get an unset timestamp value, i suspect it will be 0 since it is appears to be an uninitialized long primitive. I think if you try to access an unset optional non-primitive, it will return null. There are also isSet* methods generated for optional values that tell you if the value is unset (as opposed to set to null).

          Show
          Chris McCubbin added a comment - In a few of the popular languages, is the generated code for dealing with an optional field easy to write? Not exactly sure about all languages. In Java optional fields aren't in the constructor and need to be set via a method call. Does this complicate reading data in anyway? When reading data, we would always expect the timestamp to be set, so it would be odd to have to constantly check if its set when reading. Well, I copy whatever is in the key coming out of accumulo into the response value. That value cannot be null since it is a primitive. So on a scan the timestamp will always be set. (This happens in Util.toThrift(Key) ). In Java the timestamp is still a primitive so it won't throw an execption if you try to get an unset timestamp value, i suspect it will be 0 since it is appears to be an uninitialized long primitive. I think if you try to access an unset optional non-primitive, it will return null. There are also isSet* methods generated for optional values that tell you if the value is unset (as opposed to set to null).
          Hide
          Chris McCubbin added a comment -

          I uploaded another patch to the review board. https://reviews.apache.org/r/8039/

          This patch includes:

          • Bugfixes and some class name normalization
          • Key timestamp is now an optional field
          • Ability to create Scanners in addition to BatchScanners. Reading from either scanner use the same method.
          • Updated the python and ruby examples

          As far as the way writing is done. I think writing sets of KeyValues (and deleting sets of keys) is currently a clean interface. The thrift explicitly states in a comment now that any keys with identical rows will be grouped into one atomic write, for each call to update. Check out this python source:

          keyvalue1 = PKeyValue(PKey("a","a","a",""),"a")
          keyvalue2 = PKeyValue(PKey("a","b","b",""),"b")
          client.updateAndFlush(userpass,table,[keyvalue1,keyvalue2],[])
          

          If you guys really prefer having separate classes for Mutations and ColumnUpdates, I could change it.

          Show
          Chris McCubbin added a comment - I uploaded another patch to the review board. https://reviews.apache.org/r/8039/ This patch includes: Bugfixes and some class name normalization Key timestamp is now an optional field Ability to create Scanners in addition to BatchScanners. Reading from either scanner use the same method. Updated the python and ruby examples As far as the way writing is done. I think writing sets of KeyValues (and deleting sets of keys) is currently a clean interface. The thrift explicitly states in a comment now that any keys with identical rows will be grouped into one atomic write, for each call to update. Check out this python source: keyvalue1 = PKeyValue(PKey( "a" , "a" , "a" , "")," a") keyvalue2 = PKeyValue(PKey( "a" , "b" , "b" , "")," b") client.updateAndFlush(userpass,table,[keyvalue1,keyvalue2],[]) If you guys really prefer having separate classes for Mutations and ColumnUpdates, I could change it.
          Hide
          Keith Turner added a comment -

          If you guys really prefer having separate classes for Mutations and ColumnUpdates, I could change it.

          There interface is clean. I do not care too much. A few thoughts.

          • The thrift interface is significantly different from Java interface because of lack of mutation. I think users could quickly adapt though.
          • When sending a lot of data for one row, this interface results in a lot more overhead because row is repeated.
          • Have to expend CPU on proxy to group, but I doubt this is an issue relative to the overhead of IPC.

          Still think the flatter data structs are something to consider, i.e.

          keyvalue1 = PKeyValue("a","a","a","","a")
          keyvalue2 = PKeyValue("a","b","b","","b")
          client.updateAndFlush(userpass,table,[keyvalue1,keyvalue2],[])
          

          As I said this makes it easier to write the code for writing data, but harder for rewriting data from a scan.

          Show
          Keith Turner added a comment - If you guys really prefer having separate classes for Mutations and ColumnUpdates, I could change it. There interface is clean. I do not care too much. A few thoughts. The thrift interface is significantly different from Java interface because of lack of mutation. I think users could quickly adapt though. When sending a lot of data for one row, this interface results in a lot more overhead because row is repeated. Have to expend CPU on proxy to group, but I doubt this is an issue relative to the overhead of IPC. Still think the flatter data structs are something to consider, i.e. keyvalue1 = PKeyValue( "a" , "a" , "a" , ""," a") keyvalue2 = PKeyValue( "a" , "b" , "b" , ""," b") client.updateAndFlush(userpass,table,[keyvalue1,keyvalue2],[]) As I said this makes it easier to write the code for writing data, but harder for rewriting data from a scan.
          Hide
          Chris McCubbin added a comment -

          I added a few performance improvements and pushed them to the review board.

          Show
          Chris McCubbin added a comment - I added a few performance improvements and pushed them to the review board.
          Hide
          Jason Trost added a comment -

          I would recommend adding 2 convenience methods (and possibly removing the existing writer_update method):

          public void writer_update(String writer, List<PKeyValue> cells) throws TException {...}
          public void writer_delete(String writer, List<PKey> deletedCells) throws TException {...}
          

          While the API should be minimal, I would recommend also taking into account how often devs use the delete functionality. In my experience, deleting cells is used far less than updating cells.

          Show
          Jason Trost added a comment - I would recommend adding 2 convenience methods (and possibly removing the existing writer_update method): public void writer_update( String writer, List<PKeyValue> cells) throws TException {...} public void writer_delete( String writer, List<PKey> deletedCells) throws TException {...} While the API should be minimal, I would recommend also taking into account how often devs use the delete functionality. In my experience, deleting cells is used far less than updating cells.
          Hide
          Josh Elser added a comment -

          Is there anything else that we're still expecting to see accomplished for the 1.5.0 release in regards to the thrift proxy?

          It looks ready to go from where I'm standing!

          Show
          Josh Elser added a comment - Is there anything else that we're still expecting to see accomplished for the 1.5.0 release in regards to the thrift proxy? It looks ready to go from where I'm standing!
          Hide
          Christopher Tubbs added a comment -

          I think there might still be some questions needing resolved, and code needing polishing for the authentication stuff that might still affect the proxy, but those issues are more general issues, and not necessarily proxy-specific.

          Show
          Christopher Tubbs added a comment - I think there might still be some questions needing resolved, and code needing polishing for the authentication stuff that might still affect the proxy, but those issues are more general issues, and not necessarily proxy-specific.
          Hide
          John Vines added a comment -

          it looks like all of the necessary subtasks are in place and I've played with teh proxy a bit, so it seems like this is done.

          Show
          John Vines added a comment - it looks like all of the necessary subtasks are in place and I've played with teh proxy a bit, so it seems like this is done.
          Hide
          Hudson added a comment -

          Integrated in Accumulo-1.5-Hadoop-2.0 #87 (See https://builds.apache.org/job/Accumulo-1.5-Hadoop-2.0/87/)
          ACCUMULO-482 added some unit test for the proxy and fixed some issues (Revision 1471082)

          Result = FAILURE
          kturner :
          Files :

          • /accumulo/branches/1.5/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java
          • /accumulo/branches/1.5/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java
          Show
          Hudson added a comment - Integrated in Accumulo-1.5-Hadoop-2.0 #87 (See https://builds.apache.org/job/Accumulo-1.5-Hadoop-2.0/87/ ) ACCUMULO-482 added some unit test for the proxy and fixed some issues (Revision 1471082) Result = FAILURE kturner : Files : /accumulo/branches/1.5/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java /accumulo/branches/1.5/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java
          Hide
          Hudson added a comment -

          Integrated in Accumulo-1.5 #88 (See https://builds.apache.org/job/Accumulo-1.5/88/)
          ACCUMULO-482 added some unit test for the proxy and fixed some issues (Revision 1471082)

          Result = SUCCESS
          kturner :
          Files :

          • /accumulo/branches/1.5/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java
          • /accumulo/branches/1.5/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java
          Show
          Hudson added a comment - Integrated in Accumulo-1.5 #88 (See https://builds.apache.org/job/Accumulo-1.5/88/ ) ACCUMULO-482 added some unit test for the proxy and fixed some issues (Revision 1471082) Result = SUCCESS kturner : Files : /accumulo/branches/1.5/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java /accumulo/branches/1.5/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java
          Hide
          Hudson added a comment -

          Integrated in Accumulo-Trunk #842 (See https://builds.apache.org/job/Accumulo-Trunk/842/)
          ACCUMULO-1305 shortened proxy property names
          ACCUMULO-482 added some unit test for the proxy and fixed some issues
          ACCUMULO-1237 minimized 1.4 and 1.5 proxy diffs and fixed a few proxy issues
          ACCUMULO-1237 Applying patch from Corey Nolet that minimizes diffs between the proxy in 1.4 and 1.5. (Revision 1471132)

          Result = SUCCESS
          kturner :
          Files :

          • /accumulo/trunk
          • /accumulo/trunk/assemble
          • /accumulo/trunk/core
          • /accumulo/trunk/examples
          • /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java
          • /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java
          • /accumulo/trunk/pom.xml
          • /accumulo/trunk/proxy/README
          • /accumulo/trunk/proxy/proxy.properties
          • /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java
          • /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java
          • /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/thrift/AccumuloProxy.java
          • /accumulo/trunk/proxy/src/main/thrift/proxy.thrift
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyInstanceOperations.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyReadWrite.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxySecurityOperations.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyTableOperations.java
          • /accumulo/trunk/proxy/src/test/resources/log4j.properties
          • /accumulo/trunk/server
          • /accumulo/trunk/src
          • /accumulo/trunk/test/src/main/java/org/apache/accumulo/test/MiniAccumuloCluster.java
          Show
          Hudson added a comment - Integrated in Accumulo-Trunk #842 (See https://builds.apache.org/job/Accumulo-Trunk/842/ ) ACCUMULO-1305 shortened proxy property names ACCUMULO-482 added some unit test for the proxy and fixed some issues ACCUMULO-1237 minimized 1.4 and 1.5 proxy diffs and fixed a few proxy issues ACCUMULO-1237 Applying patch from Corey Nolet that minimizes diffs between the proxy in 1.4 and 1.5. (Revision 1471132) Result = SUCCESS kturner : Files : /accumulo/trunk /accumulo/trunk/assemble /accumulo/trunk/core /accumulo/trunk/examples /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java /accumulo/trunk/pom.xml /accumulo/trunk/proxy/README /accumulo/trunk/proxy/proxy.properties /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/thrift/AccumuloProxy.java /accumulo/trunk/proxy/src/main/thrift/proxy.thrift /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyInstanceOperations.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyReadWrite.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxySecurityOperations.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyTableOperations.java /accumulo/trunk/proxy/src/test/resources/log4j.properties /accumulo/trunk/server /accumulo/trunk/src /accumulo/trunk/test/src/main/java/org/apache/accumulo/test/MiniAccumuloCluster.java
          Hide
          Hudson added a comment -

          Integrated in Accumulo-Trunk-Hadoop-2.0 #200 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/200/)
          ACCUMULO-1305 shortened proxy property names
          ACCUMULO-482 added some unit test for the proxy and fixed some issues
          ACCUMULO-1237 minimized 1.4 and 1.5 proxy diffs and fixed a few proxy issues
          ACCUMULO-1237 Applying patch from Corey Nolet that minimizes diffs between the proxy in 1.4 and 1.5. (Revision 1471132)

          Result = SUCCESS
          kturner :
          Files :

          • /accumulo/trunk
          • /accumulo/trunk/assemble
          • /accumulo/trunk/core
          • /accumulo/trunk/examples
          • /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java
          • /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java
          • /accumulo/trunk/pom.xml
          • /accumulo/trunk/proxy/README
          • /accumulo/trunk/proxy/proxy.properties
          • /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java
          • /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java
          • /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/thrift/AccumuloProxy.java
          • /accumulo/trunk/proxy/src/main/thrift/proxy.thrift
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyInstanceOperations.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyReadWrite.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxySecurityOperations.java
          • /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyTableOperations.java
          • /accumulo/trunk/proxy/src/test/resources/log4j.properties
          • /accumulo/trunk/server
          • /accumulo/trunk/src
          • /accumulo/trunk/test/src/main/java/org/apache/accumulo/test/MiniAccumuloCluster.java
          Show
          Hudson added a comment - Integrated in Accumulo-Trunk-Hadoop-2.0 #200 (See https://builds.apache.org/job/Accumulo-Trunk-Hadoop-2.0/200/ ) ACCUMULO-1305 shortened proxy property names ACCUMULO-482 added some unit test for the proxy and fixed some issues ACCUMULO-1237 minimized 1.4 and 1.5 proxy diffs and fixed a few proxy issues ACCUMULO-1237 Applying patch from Corey Nolet that minimizes diffs between the proxy in 1.4 and 1.5. (Revision 1471132) Result = SUCCESS kturner : Files : /accumulo/trunk /accumulo/trunk/assemble /accumulo/trunk/core /accumulo/trunk/examples /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java /accumulo/trunk/pom.xml /accumulo/trunk/proxy/README /accumulo/trunk/proxy/proxy.properties /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/ProxyServer.java /accumulo/trunk/proxy/src/main/java/org/apache/accumulo/proxy/thrift/AccumuloProxy.java /accumulo/trunk/proxy/src/main/thrift/proxy.thrift /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/SimpleTest.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyInstanceOperations.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyReadWrite.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxySecurityOperations.java /accumulo/trunk/proxy/src/test/java/org/apache/accumulo/proxy/TestProxyTableOperations.java /accumulo/trunk/proxy/src/test/resources/log4j.properties /accumulo/trunk/server /accumulo/trunk/src /accumulo/trunk/test/src/main/java/org/apache/accumulo/test/MiniAccumuloCluster.java
          Hide
          David Medinets added a comment -

          I looked at the proxy/README today and some things were unclear to me. When I compile Accumulo from the top level, isn't the proxy submodule included? Why would it need to be built separately? In order to build the proxy module from within the proxy directory, I needed to run 'mvn install' at the top level because of a dependency. The proxy.properties file which needs to be edited is not the one in the build directory, it's the one in the installation directory. Also, the 'accumulo proxy' command should be run from the installation directory.

          I also got a chance to look at the proxy/python/README file. Perhaps it's just that I know very little about python but "PYTHONPATH=path/to/generated/api:path/to/thrift/libs python TestClient.py" does not look right. Should there be a ';' before the python command? Will python people know what the 'path/to/generated/api' and 'path/to/thrift/libs' are?

          Show
          David Medinets added a comment - I looked at the proxy/README today and some things were unclear to me. When I compile Accumulo from the top level, isn't the proxy submodule included? Why would it need to be built separately? In order to build the proxy module from within the proxy directory, I needed to run 'mvn install' at the top level because of a dependency. The proxy.properties file which needs to be edited is not the one in the build directory, it's the one in the installation directory. Also, the 'accumulo proxy' command should be run from the installation directory. I also got a chance to look at the proxy/python/README file. Perhaps it's just that I know very little about python but "PYTHONPATH=path/to/generated/api:path/to/thrift/libs python TestClient.py" does not look right. Should there be a ';' before the python command? Will python people know what the 'path/to/generated/api' and 'path/to/thrift/libs' are?
          Hide
          Eric Newton added a comment -

          Yeah, there's some out-of-date information in there. Please make a ticket to clean it up (and do the cleanup, if you are up for it).

          That technique for setting an environment variable for running a command is an old one:

          # run with a different DISPLAY env var:
          $ DISPLAY=:1 xclock 
          
          # gettin' crazy:
          $ HOME=/tmp USER=nobody HOST=buildhost mvn package 
          

          It's a Bourne shell thing.

          Show
          Eric Newton added a comment - Yeah, there's some out-of-date information in there. Please make a ticket to clean it up (and do the cleanup, if you are up for it). That technique for setting an environment variable for running a command is an old one: # run with a different DISPLAY env var: $ DISPLAY=:1 xclock # gettin' crazy: $ HOME=/tmp USER=nobody HOST=buildhost mvn package It's a Bourne shell thing.

            People

            • Assignee:
              Chris McCubbin
              Reporter:
              Sapan Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development