HBase
  1. HBase
  2. HBASE-6991

Escape "\" in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.95.0
    • Component/s: util
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      This patch changes Bytes.toStringBinary() and Bytes.toBytesBinary() to escape the character "\". Unprintable characters are escaped using "\\x%02X" format, but "\" was not escaped properly leading to irreversible ser/de using toStringBinary() -> toBytesBinary().
      Show
      This patch changes Bytes.toStringBinary() and Bytes.toBytesBinary() to escape the character "\". Unprintable characters are escaped using "\\x%02X" format, but "\" was not escaped properly leading to irreversible ser/de using toStringBinary() -> toBytesBinary().

      Description

      Since "\" is used to escape non-printable character but not treated as special character in conversion, it could lead to unexpected conversion.

      For example, please consider the following code snippet.

      public void testConversion() {
        byte[] original = {
            '\\', 'x', 'A', 'D'
        };
        String stringFromBytes = Bytes.toStringBinary(original);
        byte[] converted = Bytes.toBytesBinary(stringFromBytes);
        System.out.println("Original: " + Arrays.toString(original));
        System.out.println("Converted: " + Arrays.toString(converted));
        System.out.println("Reversible?: " + (Bytes.compareTo(original, converted) == 0));
      }
      
      Output:
      -------
      Original: [92, 120, 65, 68]
      Converted: [-83]
      Reversible?: false
      

      The "\" character needs to be treated as special and must be encoded as a non-printable character ("\x5C") to avoid any kind of ambiguity during conversion.

        Issue Links

          Activity

          Hide
          Aditya Kishore added a comment -

          Attaching the patch which modifies toStringBinary() to treat "\" as non-printable character and translate it to "\x5C"

          Show
          Aditya Kishore added a comment - Attaching the patch which modifies toStringBinary() to treat "\" as non-printable character and translate it to "\x5C"
          Hide
          Aditya Kishore added a comment -

          The patch include the following changes:

          1. Gets rid of unnecessary byte[] to String conversion. The "ISO-8859-1" charset does not do any transformation anyway. This also does away with the need of try-catch block.

          -    String first = new String(b, off, len, "ISO-8859-1");
          -    for (int i = 0; i < first.length() ; ++i ) {
          -      int ch = first.charAt(i) & 0xFF;
          
          +    for (int i = off; i < off + len ; ++i ) {
          +      int ch = b[i] & 0xFF;
          

          2. Removed "\" from the set of printable non-alphanumeric characters so that it can be escaped using the "\xXX" format.

          -          || " `~!@#$%^&*()-_=+[]{}\\|;:'\",.<>/?".indexOf(ch) >= 0 ) {
          
          +          || " `~!@#$%^&*()-_=+[]{}|;:'\",.<>/?".indexOf(ch) >= 0 ) {
          

          3. Added new test case to verify that the conversion is reversible for random array of bytes. Without this change the test always fails. The test add 1 extra second to the test run.

          hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          +  public void testToStringBytesBinaryReversible() {
          +    //  let's run test with 1000 randomly generated byte arrays
          +    Random rand = new Random(System.currentTimeMillis());
          +    byte[] randomBytes = new byte[1000];
          +    for (int i = 0; i < 1000; i++) {
          +      rand.nextBytes(randomBytes);
          +      verifyReversibleForBytes(randomBytes);
          +    }
          +
          +    //  some specific cases
          +    verifyReversibleForBytes(new  byte[] {});
          +    verifyReversibleForBytes(new  byte[] {'\\', 'x', 'A', 'D'});
          +    verifyReversibleForBytes(new  byte[] {'\\', 'x', 'A', 'D', '\\'});
          +  }
          +
          +  private void verifyReversibleForBytes(byte[] originalBytes) {
          +    String convertedString = Bytes.toStringBinary(originalBytes);
          +    byte[] convertedBytes = Bytes.toBytesBinary(convertedString);
          +    if (Bytes.compareTo(originalBytes, convertedBytes) != 0) {
          +      fail("Not reversible for\nbyte[]: " + Arrays.toString(originalBytes) +
          +          ",\nStringBinary: " + convertedString);
          +    }
          +  }
          

          4. And finally, fixes the two test cases which were breaking because they assumed that "\" is encoded as "\".

          hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          
          -            + "\\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\s\\xA0\\x0F\\x00\\x00"
          +            + "\\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\x5Cs\\xA0\\x0F\\x00\\x00"
          

          Setting the "Incompatible change" flag since any other code which makes the same assumption as the two test cases needs fix.

          Show
          Aditya Kishore added a comment - The patch include the following changes: 1. Gets rid of unnecessary byte[] to String conversion. The "ISO-8859-1" charset does not do any transformation anyway. This also does away with the need of try-catch block. - String first = new String (b, off, len, "ISO-8859-1" ); - for ( int i = 0; i < first.length() ; ++i ) { - int ch = first.charAt(i) & 0xFF; + for ( int i = off; i < off + len ; ++i ) { + int ch = b[i] & 0xFF; 2. Removed "\" from the set of printable non-alphanumeric characters so that it can be escaped using the "\xXX" format. - || " `~!@#$%^&*()-_=+[]{}\\|;:'\" ,.<>/?".indexOf(ch) >= 0 ) { + || " `~!@#$%^&*()-_=+[]{}|;:'\" ,.<>/?".indexOf(ch) >= 0 ) { 3. Added new test case to verify that the conversion is reversible for random array of bytes. Without this change the test always fails. The test add 1 extra second to the test run. hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java + public void testToStringBytesBinaryReversible() { + // let's run test with 1000 randomly generated byte arrays + Random rand = new Random( System .currentTimeMillis()); + byte [] randomBytes = new byte [1000]; + for ( int i = 0; i < 1000; i++) { + rand.nextBytes(randomBytes); + verifyReversibleForBytes(randomBytes); + } + + // some specific cases + verifyReversibleForBytes( new byte [] {}); + verifyReversibleForBytes( new byte [] {'\\', 'x', 'A', 'D'}); + verifyReversibleForBytes( new byte [] {'\\', 'x', 'A', 'D', '\\'}); + } + + private void verifyReversibleForBytes( byte [] originalBytes) { + String convertedString = Bytes.toStringBinary(originalBytes); + byte [] convertedBytes = Bytes.toBytesBinary(convertedString); + if (Bytes.compareTo(originalBytes, convertedBytes) != 0) { + fail( "Not reversible for \nbyte[]: " + Arrays.toString(originalBytes) + + ",\nStringBinary: " + convertedString); + } + } 4. And finally, fixes the two test cases which were breaking because they assumed that "\" is encoded as "\". hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java - + "\\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\s\\xA0\\x0F\\x00\\x00" + + "\\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\x5Cs\\xA0\\x0F\\x00\\x00" Setting the "Incompatible change" flag since any other code which makes the same assumption as the two test cases needs fix.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549444/HBASE-6991_trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 82 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549444/HBASE-6991_trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 82 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3060//console This message is automatically generated.
          Hide
          Aditya Kishore added a comment -

          It should be noted that any previously encoded StringBinary with "\" will still get correctly decoded by the unchanged toBytesBinary() function. The change to toStringBinary() ensures that new encoding of a byte array containing "\" is 100% reversible without any ambiguity.

          Show
          Aditya Kishore added a comment - It should be noted that any previously encoded StringBinary with "\" will still get correctly decoded by the unchanged toBytesBinary() function. The change to toStringBinary() ensures that new encoding of a byte array containing "\" is 100% reversible without any ambiguity.
          Hide
          Aditya Kishore added a comment -

          Ted Yu, stack Could you please review this.

          https://reviews.apache.org/r/7632/

          Show
          Aditya Kishore added a comment - Ted Yu , stack Could you please review this. https://reviews.apache.org/r/7632/
          Hide
          stack added a comment -

          Committed to trunk. Thanks for the patch Aditya. Nice one. That already encoded stuff should be undone properly should protect us against pre 0.96 data looking different in 0.96.

          Show
          stack added a comment - Committed to trunk. Thanks for the patch Aditya. Nice one. That already encoded stuff should be undone properly should protect us against pre 0.96 data looking different in 0.96.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3514 (See https://builds.apache.org/job/HBase-TRUNK/3514/)
          HBASE-6991 Escape "\" in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() (Revision 1406297)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3514 (See https://builds.apache.org/job/HBase-TRUNK/3514/ ) HBASE-6991 Escape "\" in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() (Revision 1406297) Result = SUCCESS stack : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #250 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/250/)
          HBASE-6991 Escape "\" in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() (Revision 1406297)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #250 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/250/ ) HBASE-6991 Escape "\" in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary() (Revision 1406297) Result = FAILURE stack : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #898 (See https://builds.apache.org/job/HBase-0.94/898/)
          HBASE-8085 Backport the fix for Bytes.toStringBinary() into 94 (HBASE-6991) (Tianying Chang) (Revision 1456306)

          Result = FAILURE
          enis :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #898 (See https://builds.apache.org/job/HBase-0.94/898/ ) HBASE-8085 Backport the fix for Bytes.toStringBinary() into 94 ( HBASE-6991 ) (Tianying Chang) (Revision 1456306) Result = FAILURE enis : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #124 (See https://builds.apache.org/job/HBase-0.94-security/124/)
          HBASE-8085 Backport the fix for Bytes.toStringBinary() into 94 (HBASE-6991) (Tianying Chang) (Revision 1456306)

          Result = FAILURE
          enis :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #124 (See https://builds.apache.org/job/HBase-0.94-security/124/ ) HBASE-8085 Backport the fix for Bytes.toStringBinary() into 94 ( HBASE-6991 ) (Tianying Chang) (Revision 1456306) Result = FAILURE enis : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/)
          HBASE-8085 Backport the fix for Bytes.toStringBinary() into 94 (HBASE-6991) (Tianying Chang) (Revision 1456306)

          Result = FAILURE
          enis :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/ ) HBASE-8085 Backport the fix for Bytes.toStringBinary() into 94 ( HBASE-6991 ) (Tianying Chang) (Revision 1456306) Result = FAILURE enis : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java
          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.

            People

            • Assignee:
              Aditya Kishore
              Reporter:
              Aditya Kishore
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development