Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • Core
    • None

    Description

      In the Writable.bytes() Output MapFn, an unnecessary (I believe) copy of the incoming ByteBuffer occurs[0].

      Current:

      BytesWritable bw = new BytesWritable();
      bw.set(input.array(), input.arrayOffset(), input.limit()); <- copies the array
      

      Proposed:

      BytesWritable bw = new BytesWritable(input.array()); 
      

      [0]: https://github.com/apache/crunch/blob/apache-crunch-0.15.0/crunch-core/src/main/java/org/apache/crunch/types/writable/Writables.java#L271

      Attachments

        Activity

          mkwhitacre Micah Whitacre added a comment -

          So the proposal is not functionally correct. Specifically, ByteBuffer.array() returns the backing storage byte[]. The relevant bytes could only occupy a portion of the the array. Here's a good write up[1] to explain the difference and why coping is necessary and specifically why using input.array() is incorrect.

          [1] - https://worldmodscode.wordpress.com/2012/12/14/the-java-bytebuffer-a-crash-course/

          mkwhitacre Micah Whitacre added a comment - So the proposal is not functionally correct. Specifically, ByteBuffer.array() returns the backing storage byte[]. The relevant bytes could only occupy a portion of the the array. Here's a good write up [1] to explain the difference and why coping is necessary and specifically why using input.array() is incorrect. [1] - https://worldmodscode.wordpress.com/2012/12/14/the-java-bytebuffer-a-crash-course/
          spatel89 Stephen Patel added a comment -

          Ahhh. I see. It hasn't bitten us because we always know that the byte buffers are backed by arrays and are whole and unsliced, but naturally that may not be true in the general case. Thanks for the link Micah. I'm cool with closing this jira.

          spatel89 Stephen Patel added a comment - Ahhh. I see. It hasn't bitten us because we always know that the byte buffers are backed by arrays and are whole and unsliced, but naturally that may not be true in the general case. Thanks for the link Micah. I'm cool with closing this jira.
          mkwhitacre Micah Whitacre added a comment -

          Yeah in other code (not Crunch) i've been bitten because that array got reused and the data we wanted was only a used a portion so I'd have trouble parsing because of garbage characters at the beginning or end.

          mkwhitacre Micah Whitacre added a comment - Yeah in other code (not Crunch) i've been bitten because that array got reused and the data we wanted was only a used a portion so I'd have trouble parsing because of garbage characters at the beginning or end.

          People

            mkwhitacre Micah Whitacre
            spatel89 Stephen Patel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: