Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-7279

segment-tar update from java 7 to java 8 may break persisted names using invalid characters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • segment-tar

    Description

      segment-tar relies on String.getBytes() when persisting strings such as item names.

      The problem is that the behavior for this has been changed in Java 8 with respect to invalid strings (here: null characters and unpaired surrogates).

      In Java 7, these would roundtrip, as Java was using the so-called "modified UTF-8" encoding (see https://docs.oracle.com/javase/6/docs/api/java/io/DataInput.html#modified-utf-8). This will produce byte sequence that are not valid UTF-8.

      Java 7 will read them back, but Java 8 will map the non-conforming byte sequences to the Unicode replacement character. Note that in particular, multiple child entries might get identical names as a consequence.

      I'm not sure about the severity of this, and whether something needs to be done about it. AFAIC, this is another good reason to reject invalid strings as early as possible in the stack.

      cc mduerig

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              reschke Julian Reschke
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: