Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1011

[Format] Clarify requirements around buffer padding in validity bitmaps

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • Format
    • None

    Description

      This has come up in https://github.com/apache/arrow/pull/673 and in prior discussions.

      The basic summary is that we should not write non-zero padding bytes in IPC messages. However: one cannot in general rely on the padding being non-zero when the data is in memory (for example: zero-copy slices of Arrow arrays/vectors).

      I think it would be good to clarify this point in Layout.md – namely that what gets written to the wire should be deterministic. However, in-memory algorithms should not in general expect the padding region to have a particular value. As an example, a popcount on a validity bitmap would want to exclude padding bytes from the computation. Other elementwise SIMD operations are free to use the padding bytes as they wish, with a known caveat.

      cc Wenchen Fan

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cloud_fan Wenchen Fan
            wesm Wes McKinney
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment