Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-809

C++: Writing sliced record batch to IPC writes the entire array

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.3.0
    • Component/s: C++
    • Labels:
      None

      Description

      The bug can be triggered through python:

      import pyarrow.parquet
      array = pyarrow.array.from_pylist([1] * 1000000)
      
      rb = pyarrow.RecordBatch.from_arrays([array], ['a'])
      rb2 = rb.slice(0,2)
      
      with open('/tmp/t.arrow', 'wb') as f:
        w = pyarrow.ipc.FileWriter(f, rb.schema)
        w.write_batch(rb2)
        w.close()
      

      which will result in a big file:

      $ ll /tmp/t.arrow 
      -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesm Wes McKinney
                Reporter:
                itaiin Itai Incze
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: