|
[
Permlink
| « Hide
]
Norval Hope added a comment - 29/Mar/06 07:21 PM
Patch and test files. Have to work out how to get Maven2 to replace file:// url with absolute path when testing.
I need some help regarding how to preprocess the
unit test files in Maven2 so that the "file://" URL includes a sensible absolute path name by the time the unit test actually runs. Or alternatively the unit test could cover the "http://" case only (mandating an internet connection being available for the unit tests to pass). Not sure what the right answers are to these conundrums... I have a new patch but can't remove the existing attachments (all should be removed and replaced with single latest patch).
I found my fix worked fine for binary attributes to be stored in the Attribute object as a byte[], but my particular interest was in a string field. Hence the new patch includes support for an extension of the rfc2849 support to handle a) specifying the value read from the URL should be treated as a String and b) if so, then specifying what its encoding is. For example the following are all valid (contents of attached test .ldif file): # attribute is stored as byte[] httpurl:< http://isis/public/hopno02/jx.txt # attribute is stored as byte[] fileurl:< file:///D:/src/ad/shared/ldap/src/test/resources/test_ldif_inclusion.txt # attribute is stored as UTF8 encoded String strfileurl:< string_encoding=UTF8;file:///D:/src/ad/shared/ldap/src/test/resources/test_ldif_inclusion.txt where the "string_encoding=<encoding>;" bit is not part of the RFC. I can't see anywhere in the RFC where this sort of issue is dealt with. Access to the schema for each attribute would help in knowing what the directory view of it is, but still won't address the issue of the encoding for the file being read. Any guidance welcomed... this patch supercedes all previous attachments
As stated in , UTF-8 in http://ftp.gnus.org/internet-drafts/draft-good-ldap-ldif-06.txt (Appendix A):
"the only character set that may be used in LDIF" So we can think that attributes value imported from a file implies that the file is encoded using UTF-8 charset. This note in the RFC and your associated assumption takes care of my immediate usecase (I'm concerned with a string - and I'm happy to accept that the referenced file must already be UTF-8 encoded).
Longer term I think there is still the question of whether the URL contains binary versus UTF-8 encoded string content. In particular I'm thinking about things like external biometrics (retinal scan etc) which may be imported into the directory from files initially. In this case the code handling the ":<" would need either: a) Access to the schema to know an attribute was binary, if indeed it is possible to deduce this from the schema (excuse my ignorance). The code does not currently have this context available to it. b) Some extra syntax similar to my earlier addition to denote whether binary or string data is to be read (assuming a UTF-8 encode string is the default, then the binary case might look like): # attribute is stored as byte[] fileurl:< binary, file:///D:/retinal_scan/joe_blogs.dat Well, the (a) is possible, but a little bit overkilling. We are supposed to send data to the server which will check their sanity against the schema internally. Of course, as we are connected to the server, we could ask for the schema.
The (b) seems to be a better soltion. However, just keep in mind that when importing files, you are supposed to put them into binary data (like jpegPhoto). I don't know if it makes sense to inject text into an attribute like 'givename' or 'description'. I guess that the initial intent of G. Good was to support a kind of ldap BLOB, so the data are supposed to be binary. I must admit that I suffered from this UTF-8/ISO8859-1 format when injecting data into a ldap server, so your idea seems to be a good solution. I have added a method to deal with imported files that could take an encoding as a parameter (it was not committed), so we can easily extended the :< operation the way you suggested. Last thing, remember that in any case, attributes are always stored as byte[], not as UTF-8 Strings, except if the schema tells that the attributeType is not binary. Here is the transformations applied to an attribute value if you send it through a Ldap client : initial data ASN.1 ASN.1 Storage encoding decoding UTF-8 ------> byte[] -----------> byte[] ---------> if attributeType is binary, byte[], String(UTF-8) otherwise binary ------> byte[] -----------> byte[] ---------> if attributeType is binary, byte[], String(UTF-8) otherwise So if the file contains binary data, it will *not* be transformed in any case, except if you fill a textual attributeType. The LdifReader class now accept ':<' import, with a limited size of 1024 Kbytes.
The apacheds-tools project can be used to import a ldif file into any Ldap server. I think it has been solved months ago.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||