[ARROW-3667] [JS] Incorrectly reads record batches with an all null column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: JS-0.3.1
Fix Version/s: JS-0.4.1
Component/s: JavaScript
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/19973

Description

The JS library seems to incorrectly read any columns that come after an all-null column in IPC buffers produced by pyarrow.

Here's a python script that generates two arrow buffers, one with an all-null column followed by a utf-8 column, and a second with those two reversed

import pyarrow as pa
import pandas as pd

def serialize_to_arrow(df, fd, compress=True):
  batch = pa.RecordBatch.from_pandas(df)
  writer = pa.RecordBatchFileWriter(fd, batch.schema)

  writer.write_batch(batch)
  writer.close()

if __name__ == "__main__":
    df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 'def', 'ghi']}, columns=['nulls', 'not nulls'])
    with open('bad.arrow', 'wb') as fd:
        serialize_to_arrow(df, fd)
    df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
    with open('good.arrow', 'wb') as fd:
        serialize_to_arrow(df, fd)

JS incorrectly interprets the [null, not null] case:

> var arrow = require('apache-arrow')
undefined
> var fs = require('fs')
undefined
> arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not nulls').get(0)
'abc'
> arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
'\u0000\u0000\u0000\u0000\u0003\u0000\u0000\u0000\u0006\u0000\u0000\u0000\t\u0000\u0000\u0000'

Presumably this is because pyarrow is omitting some (or all) of the buffers associated with the all-null column, but the JS IPC reader is still looking for them, causing the buffer count to get out of sync.

Attachments

Issue Links

is related to

ARROW-1636 [Format] Integration tests for null type

Resolved

Activity

People

Assignee:: Paul Taylor

Reporter:: Brian Hulette

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 01/Nov/18 03:43

Updated:: 11/Jan/23 07:28

Resolved:: 02/Mar/19 20:38