Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16958

[C++][FlightRPC] Flight generates misaligned buffers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++, FlightRPC
    • None

    Description

      Protobuf's wire format design + our zero-copy serializer/deserializer mean that buffers can end up misaligned. On some Arrow versions, this can cause segfaults in kernels assuming alignment (and generally violates expectations).

      We should:

      • Possibly include buffer alignment in array validation
      • See if we can adjust the serializer to somehow pad things properly
      • See if we can do anything about this in the deserializer

      Example:

      import pyarrow as pa
      import pyarrow.flight as flight
      
      class TestServer(flight.FlightServerBase):
          def do_get(self, context, ticket):
              schema = pa.schema(
                  [
                      ("index", pa.int64()),
                      ("int8", pa.float64()),
                      ("int16", pa.float64()),
                      ("int32", pa.float64()),
                  ]
              )
              return flight.RecordBatchStream(pa.table([
                  [0, 1, 2, 3],
                  [0, 1, None, 3],
                  [0, 1, 2, None],
                  [0, None, 2, 3],
              ], schema=schema))
      
      
      with TestServer() as server:
          client = flight.connect(f"grpc://localhost:{server.port}")
          table = client.do_get(flight.Ticket(b"")).read_all()
          for col in table:
              print(col.type)
              for chunk in col.chunks:
                  for buf in chunk.buffers():
                      if not buf: continue
                      print("buffer is 8-byte aligned?", buf.address % 8)
                  chunk.cast(pa.float32())
      

      On Arrow 8

      int64
      buffer is 8-byte aligned? 1
      double
      buffer is 8-byte aligned? 1
      buffer is 8-byte aligned? 1
      double
      buffer is 8-byte aligned? 1
      buffer is 8-byte aligned? 1
      double
      buffer is 8-byte aligned? 1
      buffer is 8-byte aligned? 1
      

      On Arrow 7

      int64
      buffer is 8-byte aligned? 4
      double
      buffer is 8-byte aligned? 4
      buffer is 8-byte aligned? 4
      fish: Job 1, 'python ../test.py' terminated by signal SIGSEGV (Address boundary error)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            lidavidm David Li
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: