Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • jcr
    • None

    Description

      At times users store large text values as String property which causes issues later in form of higher resource consumption of caches, higher repository size etc.

      It would be good to have a validator which can flag such strings at time of commit. The size limit can be the one we use for in lined binary like 1-2KB

      Attachments

        1. OAK-6071-v1.patch
          5 kB
          Chetan Mehrotra

        Issue Links

          Activity

            Another option can be that stores convert such property to binary in a transparent way

            chetanm Chetan Mehrotra added a comment - Another option can be that stores convert such property to binary in a transparent way

            patch for the same which logs a warning if the size of any string property is greater than 4k

            15:59:21.893 WARN  [pool-1-thread-1] NodeImpl.java:1441      Large string property [/a/foo] detected (4001 size).
            15:59:21.972 WARN  [pool-1-thread-1] NodeImpl.java:1441      Large string property [/b/foo] detected (4001 size).
            
            

            mduerig Please review the patch. Warning is similar to one we have for OAK-1454.

            chetanm Chetan Mehrotra added a comment - patch for the same which logs a warning if the size of any string property is greater than 4k 15:59:21.893 WARN [pool-1-thread-1] NodeImpl.java:1441 Large string property [/a/foo] detected (4001 size). 15:59:21.972 WARN [pool-1-thread-1] NodeImpl.java:1441 Large string property [/b/foo] detected (4001 size). mduerig Please review the patch. Warning is similar to one we have for OAK-1454 .

            I'm not sure whether his is actually helpful. In my experience Warnings in the logs tend to get mostly ignored. To that respect OAK-1454 didn't caused any useful action to my knowledge.

            Also the patch would create a lot of noise. In a recent analysis on a production repository I found

            • 32.0E+6 strings s with 128 chars < s < 1k chars,
            • 450.3E+3 strings s with 1k chars < s < 16k chars,
            • 8.3E+3 strings s with 16k chars s < 1M chars and
            • 3.0E+0 strings s with s > 1M chars.

            Can't we collect this kind of information on the fly (i.e. through a query or tooling)? Another approach would be to implement a commit hook that gathers such kind of information/statistics.

            mduerig Michael DĂĽrig added a comment - I'm not sure whether his is actually helpful. In my experience Warnings in the logs tend to get mostly ignored. To that respect OAK-1454 didn't caused any useful action to my knowledge. Also the patch would create a lot of noise. In a recent analysis on a production repository I found 32.0E+6 strings s with 128 chars < s < 1k chars, 450.3E+3 strings s with 1k chars < s < 16k chars, 8.3E+3 strings s with 16k chars s < 1M chars and 3.0E+0 strings s with s > 1M chars. Can't we collect this kind of information on the fly (i.e. through a query or tooling)? Another approach would be to implement a commit hook that gathers such kind of information/statistics.

            I'm not sure whether his is actually helpful. In my experience Warnings in the logs tend to get mostly ignored

            Thats true but at least the path would be logged which would be of some help.

            May be we enable the warning for now and later look into implicitly converting any such string to a binary property at NodeStore level.

            mreutegg catholicon Thoughts?

            chetanm Chetan Mehrotra added a comment - I'm not sure whether his is actually helpful. In my experience Warnings in the logs tend to get mostly ignored Thats true but at least the path would be logged which would be of some help. May be we enable the warning for now and later look into implicitly converting any such string to a binary property at NodeStore level. mreutegg catholicon Thoughts?
            catholicon Vikas Saurabh added a comment -

            chetanm, admittedly, I've been lazy with OAK-4322 - but, imo, we should invest on that issue. For threshold between string and binary - I think a size that warrants a binary to go into blob store should be a good candidate for virtual-string-backed-by-binary approach. But, as mduerig pointed above, shouting in logs might create a little too much noise for such cases!?

            catholicon Vikas Saurabh added a comment - chetanm , admittedly, I've been lazy with OAK-4322 - but, imo, we should invest on that issue. For threshold between string and binary - I think a size that warrants a binary to go into blob store should be a good candidate for virtual-string-backed-by-binary approach. But, as mduerig pointed above, shouting in logs might create a little too much noise for such cases!?

            May be we set the default limit (and make it configurable) to 1M. I agree logging is ignored but then at least some logging would help us in correlating repository paths which lead to issue due to such big properties

            chetanm Chetan Mehrotra added a comment - May be we set the default limit (and make it configurable) to 1M. I agree logging is ignored but then at least some logging would help us in correlating repository paths which lead to issue due to such big properties

            I agree with catholicon and mduerig and I would also prefer to rather invest time in e.g. OAK-4322. It is categorised as an improvement for DocumentMK, but may also be useful for SegmentMK.

            mreutegg Marcel Reutegger added a comment - I agree with catholicon and mduerig and I would also prefer to rather invest time in e.g. OAK-4322 . It is categorised as an improvement for DocumentMK, but may also be useful for SegmentMK.

            People

              chetanm Chetan Mehrotra
              chetanm Chetan Mehrotra
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: