Sling
  1. Sling
  2. SLING-2425

Incorrect and inconsistent escaping of property names used in JcrPropertyMap

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JCR Resource 2.0.10
    • Fix Version/s: JCR Resource 2.1.0
    • Component/s: JCR
    • Labels:
      None

      Description

      The JcrPropertyMap uses the (wrong) ISO9075 encoding for property names, and this also behaves differently between the read() and readFully() variants.

      1) ISO9075 is needed for XML names, e.g. for mapping JCR names into Xpath queries. But the set of valid JCR names is much larger (for example "-" is valid, while it is not allowed in ISO9075 and becomes "x002d"). org.apache.jackrabbit.util.Text#escapeIllegalJcrChars() must be used instead to escape any string for use as JCR names. [0]

      2) Inconsistency:
      a) read() will take the key and use ISO9075#encodePath(), before looking up the jcr property using the encoded variant
      b) readFully() will go through all jcr properties and cache them with the key using ISO9075#decode()

      Hence for all valid JCR names, which are not valid under ISO9075 (like "1_prop", "-foo"), these can be looked up using the cached variant b) (as decode() won't touch them), while they cannot be looked up using read() at all due to the forced "arbitrary" escaping.

      I think there should be no auto-magically escaping at all (also not in the accompanying JcrModifiablePropertyMap). Incorrect naming errors should simply be passed through, it is the job of the application to handle that. The framework should not run an arbitrary & undocumented escaping, if it cannot enforce that anyway, since there are other ways to create properties with a different valid char set (using the JCR API).

      [0] http://wiki.apache.org/jackrabbit/EncodingAndEscaping

      1. SLING-2425-test-dots.patch
        1 kB
        Alexander Klimetschek

        Issue Links

          Activity

          Hide
          Alexander Klimetschek added a comment -

          BTW, this made me crazy while debugging: looking up an existing "-foo" property failed in my case, but when I stepped through the code, it worked.

          It turned out that since the ValueMap is a Map, it is specifically handled in my IDE's debugging inspector and it calls at least size() automatically - which triggers a readFully(), while the normal code path just yields an individual read() call.

          Show
          Alexander Klimetschek added a comment - BTW, this made me crazy while debugging: looking up an existing "-foo" property failed in my case, but when I stepped through the code, it worked. It turned out that since the ValueMap is a Map, it is specifically handled in my IDE's debugging inspector and it calls at least size() automatically - which triggers a readFully(), while the normal code path just yields an individual read() call.
          Hide
          Alexander Klimetschek added a comment -

          The ISO9075 encoding was introduced with SLING-1077.

          Show
          Alexander Klimetschek added a comment - The ISO9075 encoding was introduced with SLING-1077 .
          Hide
          Carsten Ziegeler added a comment -

          We explicitly had to add the encoding (therefore SLING-1077) as many apps where already relying on this behaviour and without it completely broke. So we can't change the escaping part

          I'll look into the inconsistency problem

          Show
          Carsten Ziegeler added a comment - We explicitly had to add the encoding (therefore SLING-1077 ) as many apps where already relying on this behaviour and without it completely broke. So we can't change the escaping part I'll look into the inconsistency problem
          Hide
          Alexander Klimetschek added a comment -

          How do you think the inconsistency can be resolved when using the JCR API as well?

          Show
          Alexander Klimetschek added a comment - How do you think the inconsistency can be resolved when using the JCR API as well?
          Hide
          Carsten Ziegeler added a comment -

          There is no solution to that - I was talking about the inconsistency between read() and readFully()

          Show
          Carsten Ziegeler added a comment - There is no solution to that - I was talking about the inconsistency between read() and readFully()
          Hide
          Carsten Ziegeler added a comment -

          And rethinking this, the escaping in general is good - the app should not have to care about limitations in the underlying storage. And as long as one is using a single API there is no problem with that approach. As soon as people start mixing APIs it gets messy.

          Show
          Carsten Ziegeler added a comment - And rethinking this, the escaping in general is good - the app should not have to care about limitations in the underlying storage. And as long as one is using a single API there is no problem with that approach. As soon as people start mixing APIs it gets messy.
          Hide
          Alexander Klimetschek added a comment -

          But mixing the Resource/ValueMap and JCR APIs is normal with Sling, especially considering the Sling POST servlet, which uses JCR directly and does not adhere to these escaping rules.

          Also, there is no limitation in JCR which this escaping fixes - as mentioned, this is the wrong escaping if any, as it reduces the allowed charset! ISO9075 is for xml (xpath queries), but not fir JCR paths.

          Furthermore the escaping is not documented in the API at all.

          What was the exact problem that lead to this?

          Show
          Alexander Klimetschek added a comment - But mixing the Resource/ValueMap and JCR APIs is normal with Sling, especially considering the Sling POST servlet, which uses JCR directly and does not adhere to these escaping rules. Also, there is no limitation in JCR which this escaping fixes - as mentioned, this is the wrong escaping if any, as it reduces the allowed charset! ISO9075 is for xml (xpath queries), but not fir JCR paths. Furthermore the escaping is not documented in the API at all. What was the exact problem that lead to this?
          Hide
          Tobias Bocanegra added a comment -

          +1 for keeping it consistent
          +1 for using org.apache.jackrabbit.util.Text#escapeIllegalJcrChars() as escaping, as the application does not know it the resource is backed by JCR or something else.

          Show
          Tobias Bocanegra added a comment - +1 for keeping it consistent +1 for using org.apache.jackrabbit.util.Text#escapeIllegalJcrChars() as escaping, as the application does not know it the resource is backed by JCR or something else.
          Hide
          Carsten Ziegeler added a comment -

          As Tobi points out, the app does not know what the backend is, so we need to escape internally and not in the app.

          The exact problem was that users simply used key names containing illegal jcr chars

          I see the point that we use the wrong escaping - however from an app perspective it doesn't matter. It's ugly if you access the content in a different way. The problem I see is that we can't change the encode without getting incompatible (or maybe I'm wrong)

          Show
          Carsten Ziegeler added a comment - As Tobi points out, the app does not know what the backend is, so we need to escape internally and not in the app. The exact problem was that users simply used key names containing illegal jcr chars I see the point that we use the wrong escaping - however from an app perspective it doesn't matter. It's ugly if you access the content in a different way. The problem I see is that we can't change the encode without getting incompatible (or maybe I'm wrong)
          Hide
          Alexander Klimetschek added a comment -

          > I see the point that we use the wrong escaping - however from an app perspective it doesn't matter. It's ugly if you access the content in a different way.

          It's impossible to access properties (or child node properties) that contain chars escaped by ISO9075, for example "some-property". This is the major issue - my current workaround is to force the readFully() variant, in which it works because of the inconsistency.

          IMHO fixing this escaping is more important than backwards compatibility. Code relying on that (undocumented feature) would need to be adapted, using the ISO9075 encoding themselves before accessing ValueMap.

          Show
          Alexander Klimetschek added a comment - > I see the point that we use the wrong escaping - however from an app perspective it doesn't matter. It's ugly if you access the content in a different way. It's impossible to access properties (or child node properties) that contain chars escaped by ISO9075, for example "some-property". This is the major issue - my current workaround is to force the readFully() variant, in which it works because of the inconsistency. IMHO fixing this escaping is more important than backwards compatibility. Code relying on that (undocumented feature) would need to be adapted, using the ISO9075 encoding themselves before accessing ValueMap.
          Hide
          Carsten Ziegeler added a comment -

          Maybe we're talking about different things or I got you wrong. Can you post/add a test which shows the problem?

          Show
          Carsten Ziegeler added a comment - Maybe we're talking about different things or I got you wrong. Can you post/add a test which shows the problem?
          Hide
          Felix Meschberger added a comment -

          I think encoding is absolutely required. The application does not have to care about whether and how the persistence layer supports the property names. The ValueMap implementation has to cope with this properly and transparently.

          However, if the Jackrabbit suggested way of encoding is not ISO 9075 but Jackrabbit's own encoding, then we should abide by this rule. Yet, there is another text on encoding: Section 3.2.5.4, Exposing Non-JCR Names, of the JCR 2 spec. Extrapolating from that, yet another encoding might be stipulated using private-use Unicode characters...

          Can we not handle this in a backwards compatible manner ?

          • When creating properties, the Jackrabbit encoding is used
          • When updating properties: either the Jackrabbit encoding isued (removing the property with the old encoding)
          • When deleting properties, we use the encoding of the original property
          • When reading with read all we check the encoding:
          • if the string contains /_x[0-9a-fA-F] {4}

            _/ then use ISO decoding

          • if the string contains /%[0-9a-fA-F] {2}

            / then use Jackrabbit decoding

          • else don't decode at all
          • When reading an uncached value both encodings should be tried

          Of course, the internal read and readall methods should behave the same - support whatever encoding is used in the repository but internally cache un-encoded. I do not understand where your problem really is here (you didn't fully disclose it).

          As for documentation: This behaviour should probably be documented in the JcrPropertyMap class exposed from the JCR Resource bundle.

          Show
          Felix Meschberger added a comment - I think encoding is absolutely required. The application does not have to care about whether and how the persistence layer supports the property names. The ValueMap implementation has to cope with this properly and transparently. However, if the Jackrabbit suggested way of encoding is not ISO 9075 but Jackrabbit's own encoding, then we should abide by this rule. Yet, there is another text on encoding: Section 3.2.5.4, Exposing Non-JCR Names, of the JCR 2 spec. Extrapolating from that, yet another encoding might be stipulated using private-use Unicode characters... Can we not handle this in a backwards compatible manner ? When creating properties, the Jackrabbit encoding is used When updating properties: either the Jackrabbit encoding isued (removing the property with the old encoding) When deleting properties, we use the encoding of the original property When reading with read all we check the encoding: if the string contains /_x [0-9a-fA-F] {4} _/ then use ISO decoding if the string contains /% [0-9a-fA-F] {2} / then use Jackrabbit decoding else don't decode at all When reading an uncached value both encodings should be tried Of course, the internal read and readall methods should behave the same - support whatever encoding is used in the repository but internally cache un-encoded. I do not understand where your problem really is here (you didn't fully disclose it). As for documentation: This behaviour should probably be documented in the JcrPropertyMap class exposed from the JCR Resource bundle.
          Hide
          Carsten Ziegeler added a comment -

          For the backwards compatible manner: I fear that this testing decreases performance for reading as each and every property has to be checked. But maybe this can be done in a fast manner.

          Show
          Carsten Ziegeler added a comment - For the backwards compatible manner: I fear that this testing decreases performance for reading as each and every property has to be checked. But maybe this can be done in a fast manner.
          Hide
          Alexander Klimetschek added a comment - - edited

          Sorry if my original description was unclear. The issue I have is that I can't read certain properties via the ValueMap, that can be created using the JCR API and the Sling POST servlet:

          1. create properties:

          curl -u admin:admin -F "-prop=value" -F "1prop=value" http://localhost:8080/tmp

          2. try to read them:

          ValueMap map = resolver.getResource("/tmp").adaptTo(ValueMap.class);
          map.get("-prop", String.class); // fails, returns null
          map.get("1prop", String.class); // fails, returns null

          My current workaround (due to the fact that the encoding is not done in the cached variant):

          // right after fetching the map
          map.size(); // triggers readFully() internally
          map.get("1prop", String.class); // returns "value"

          Property names starting with "-" or numbers are affected (and more). (Agree this is a rare case, but in our use case we need to resort to a specific naming to separate these properties from others. And these are perfectly valid in JCR.)

          Show
          Alexander Klimetschek added a comment - - edited Sorry if my original description was unclear. The issue I have is that I can't read certain properties via the ValueMap, that can be created using the JCR API and the Sling POST servlet: 1. create properties: curl -u admin:admin -F "-prop=value" -F "1prop=value" http://localhost:8080/tmp 2. try to read them: ValueMap map = resolver.getResource("/tmp").adaptTo(ValueMap.class); map.get("-prop", String.class); // fails, returns null map.get("1prop", String.class); // fails, returns null My current workaround (due to the fact that the encoding is not done in the cached variant): // right after fetching the map map.size(); // triggers readFully() internally map.get("1prop", String.class); // returns "value" Property names starting with "-" or numbers are affected (and more). (Agree this is a rare case, but in our use case we need to resort to a specific naming to separate these properties from others. And these are perfectly valid in JCR.)
          Hide
          Carsten Ziegeler added a comment -

          Thanks Alex,

          ok, it's clear that we should only encode if there is a need to encode and obviously the encoding should either be applied always or never and not depend on side effects.
          I guess/fear we have to do something like Felix suggests earlier on in order to a) get this right and b) be compatible

          Show
          Carsten Ziegeler added a comment - Thanks Alex, ok, it's clear that we should only encode if there is a need to encode and obviously the encoding should either be applied always or never and not depend on side effects. I guess/fear we have to do something like Felix suggests earlier on in order to a) get this right and b) be compatible
          Hide
          Carsten Ziegeler added a comment -

          I've committed a first implementation in revision 1325221 which basically works like Felix has outlined
          I'll leave this bug open for further comments / testing

          Show
          Carsten Ziegeler added a comment - I've committed a first implementation in revision 1325221 which basically works like Felix has outlined I'll leave this bug open for further comments / testing
          Hide
          Alexander Klimetschek added a comment -

          I had a quick look at the code. Generally it looks good wrt to the backwards compatibility and unit tests present. Only the case if the name is a path looks fragile with the custom parsing. For sure it does not catch relative path steps such as "." or "..", they will get escaped incorrectly.

          Looking at that I think that not doing any encodings in the value map layer is much simpler - this encoding logic can get endlessly complex and I guess there will always be a case where it fails if the application cannot decide what to do (e.g. escape "/" or not).

          Show
          Alexander Klimetschek added a comment - I had a quick look at the code. Generally it looks good wrt to the backwards compatibility and unit tests present. Only the case if the name is a path looks fragile with the custom parsing. For sure it does not catch relative path steps such as "." or "..", they will get escaped incorrectly. Looking at that I think that not doing any encodings in the value map layer is much simpler - this encoding logic can get endlessly complex and I guess there will always be a case where it fails if the application cannot decide what to do (e.g. escape "/" or not).
          Hide
          Carsten Ziegeler added a comment -

          For the paths, "." and ".." have never been handled before either - so I think we are fine

          Show
          Carsten Ziegeler added a comment - For the paths, "." and ".." have never been handled before either - so I think we are fine
          Hide
          Alexander Klimetschek added a comment - - edited

          Sorry, I thought all dots get escaped by Text.escapeIllegalJcrChars. But most work, only using "node/../prop" doesn't work.

          Added attachment SLING-2425-test-dots.patch which adds tests with "./prop" "node/prop" and "node/../prop" where the last one currently fails.

          Show
          Alexander Klimetschek added a comment - - edited Sorry, I thought all dots get escaped by Text.escapeIllegalJcrChars. But most work, only using "node/../prop" doesn't work. Added attachment SLING-2425 -test-dots.patch which adds tests with "./prop" "node/prop" and "node/../prop" where the last one currently fails.
          Hide
          Carsten Ziegeler added a comment -

          But the test failed before as well, right?

          Show
          Carsten Ziegeler added a comment - But the test failed before as well, right?
          Hide
          Carsten Ziegeler added a comment -

          Just checked the latest resource.jcr release. The test fail there as well, therefore I'm now closing this

          Show
          Carsten Ziegeler added a comment - Just checked the latest resource.jcr release. The test fail there as well, therefore I'm now closing this
          Hide
          Alex Parvulescu added a comment -

          The changes introduced a regression: see SLING-2502.

          Show
          Alex Parvulescu added a comment - The changes introduced a regression: see SLING-2502 .

            People

            • Assignee:
              Carsten Ziegeler
              Reporter:
              Alexander Klimetschek
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development