Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3263

[R] Use R sentinel values for missingness in addition to bitmask

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Format, R
    • Labels:
      None

      Description

      R uses sentinal values to indicate missingness within Atomic vectors (read arrays in Arrow parlance, AFAIK). 

      Currently according to [~wesmckinn], the current value in the array in memory is undefined if the bitmap indicating missingness is set to 1. 

      This will force R to copy and modify data whenever adopting Arrow data which has missingness present as a native vector.

      If the value were written to the relevant sentinal values (INT_MIN for 32 bit integers, and NaN with payload 1954 for double precision floats) in addition to the bit mask, then R would be able to use Arrow as intended while not breaking any other systems.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gmbecker Gabriel Becker
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: