Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6046

[C++] Slice RecordBatch of String array with offset 0 returns whole batch

    XMLWordPrintableJSON

Details

    Description

      We are seeing a very similar bug as in ARROW-809, just for a RecordBatch of strings. A slice of a RecordBatch with a string column and offset =0 returns the whole batch instead.

       

      import pandas as pd
      import pyarrow as pa
      df = pd.DataFrame({ 'b': ['test' for x in range(1000_000)]})
      tbl = pa.Table.from_pandas(df)
      batch = tbl.to_batches()[0]
      
      batch.slice(0,2).serialize().size
      # 4000232
      
      batch.slice(1,2).serialize().size
      # 240
      

       

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              saschahofmann Sascha Hofmann
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h