Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.14.1
Description
We are seeing a very similar bug as in ARROW-809, just for a RecordBatch of strings. A slice of a RecordBatch with a string column and offset =0 returns the whole batch instead.
import pandas as pd import pyarrow as pa df = pd.DataFrame({ 'b': ['test' for x in range(1000_000)]}) tbl = pa.Table.from_pandas(df) batch = tbl.to_batches()[0] batch.slice(0,2).serialize().size # 4000232 batch.slice(1,2).serialize().size # 240
Attachments
Issue Links
- links to