Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-393

byte[] constructor for Utf8 desired

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We've come across a few use cases where we know that a given byte array is properly Utf8 encoded, but Utf8 has no constructor to take it. Instead, we have to turn it into a String first just to have it swapped back. This is sucky.

        Activity

        Hide
        jmhodges Jeff Hodges added a comment -

        Here's a patch that adds a byte[] constructor to Utf8 and adds the beginnings of a TestUtf8 class.

        Show
        jmhodges Jeff Hodges added a comment - Here's a patch that adds a byte[] constructor to Utf8 and adds the beginnings of a TestUtf8 class.
        Hide
        kevinoliver Kevin Oliver added a comment -

        While its not obvious, the work around is to use something like this:

        byte[] myBytes = ...;
        Utf8 utf8 = new Utf8();
        utf8.setLength(myBytes.length);
        System.arraycopy(myBytes, 0, utf8.getBytes(), 0, myBytes.length);
        

        That said, I agree that a Utf8(byte[]) constructor would be useful.

        Show
        kevinoliver Kevin Oliver added a comment - While its not obvious, the work around is to use something like this: byte [] myBytes = ...; Utf8 utf8 = new Utf8(); utf8.setLength(myBytes.length); System .arraycopy(myBytes, 0, utf8.getBytes(), 0, myBytes.length); That said, I agree that a Utf8(byte[]) constructor would be useful.
        Hide
        cutting Doug Cutting added a comment -

        I just committed this. I made two minor changes:

        • in the test, I specified "UTF-8", as String#getBytes() uses the installation's default encoding by default.
        • i also used 2-space-per-level indentation

        Thanks, Jeff!

        Show
        cutting Doug Cutting added a comment - I just committed this. I made two minor changes: in the test, I specified "UTF-8", as String#getBytes() uses the installation's default encoding by default. i also used 2-space-per-level indentation Thanks, Jeff!
        Hide
        thiru_mg Thiruvalluvan M. G. added a comment -

        While at it, it will be useful to add another constructor that takes a sub-array:

        Utf8(byte[] bytes, int start, int len);

        Show
        thiru_mg Thiruvalluvan M. G. added a comment - While at it, it will be useful to add another constructor that takes a sub-array: Utf8(byte[] bytes, int start, int len);

          People

          • Assignee:
            jmhodges Jeff Hodges
            Reporter:
            jmhodges Jeff Hodges
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development