Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3667

[JS] Incorrectly reads record batches with an all null column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • JS-0.3.1
    • JS-0.4.1
    • JavaScript
    • None

    Description

      The JS library seems to incorrectly read any columns that come after an all-null column in IPC buffers produced by pyarrow.

      Here's a python script that generates two arrow buffers, one with an all-null column followed by a utf-8 column, and a second with those two reversed

      import pyarrow as pa
      import pandas as pd
      
      def serialize_to_arrow(df, fd, compress=True):
        batch = pa.RecordBatch.from_pandas(df)
        writer = pa.RecordBatchFileWriter(fd, batch.schema)
      
        writer.write_batch(batch)
        writer.close()
      
      if __name__ == "__main__":
          df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 'def', 'ghi']}, columns=['nulls', 'not nulls'])
          with open('bad.arrow', 'wb') as fd:
              serialize_to_arrow(df, fd)
          df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
          with open('good.arrow', 'wb') as fd:
              serialize_to_arrow(df, fd)
      

      JS incorrectly interprets the [null, not null] case:

      > var arrow = require('apache-arrow')
      undefined
      > var fs = require('fs')
      undefined
      > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not nulls').get(0)
      'abc'
      > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
      '\u0000\u0000\u0000\u0000\u0003\u0000\u0000\u0000\u0006\u0000\u0000\u0000\t\u0000\u0000\u0000'
      

      Presumably this is because pyarrow is omitting some (or all) of the buffers associated with the all-null column, but the JS IPC reader is still looking for them, causing the buffer count to get out of sync.

      Attachments

        Issue Links

          Activity

            People

              paul.e.taylor Paul Taylor
              bhulette Brian Hulette
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: