Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1011

[Format] Clarify requirements around buffer padding in validity bitmaps

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: Format
    • Labels:
      None

      Description

      This has come up in https://github.com/apache/arrow/pull/673 and in prior discussions.

      The basic summary is that we should not write non-zero padding bytes in IPC messages. However: one cannot in general rely on the padding being non-zero when the data is in memory (for example: zero-copy slices of Arrow arrays/vectors).

      I think it would be good to clarify this point in Layout.md – namely that what gets written to the wire should be deterministic. However, in-memory algorithms should not in general expect the padding region to have a particular value. As an example, a popcount on a validity bitmap would want to exclude padding bytes from the computation. Other elementwise SIMD operations are free to use the padding bytes as they wish, with a known caveat.

      cc Wenchen Fan

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              wesm Wes McKinney
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: