Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2308

Serialized tensor data should be 64-byte aligned.

    XMLWordPrintableJSON

    Details

      Description

      See https://github.com/ray-project/ray/issues/1658 for an example of this issue. Non-aligned data can trigger a copy when fed into TensorFlow and things like that.

      import pyarrow as pa
      import numpy as np
      
      x = np.zeros(10)
      y = pa.deserialize(pa.serialize(x).to_buffer())
      
      x.ctypes.data % 64  # 0 (it starts out aligned)
      y.ctypes.data % 64  # 48 (it is no longer aligned)
      

      It should be possible to fix this by calling something like RETURN_NOT_OK(AlignStreamPosition(dst)); before writing the array data. Note that we already do this before writing the tensor header, but the tensor header is not necessarily a multiple of 64 bytes, so the subsequent data can be unaligned.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                robertnishihara Robert Nishihara
                Reporter:
                robertnishihara Robert Nishihara
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: