Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.15.1
-
Docker on Linux 5.2.18-200.fc30.x86_64; Python 3.7.4
Description
Steps to reproduce:
import pyarrow as pa arr = pa.array(["a", "b", "b", "b"])[1:] arr.dictionary_encode()
Expected results:
-- dictionary:
[
"b"
]
-- indices:
[
0,
0,
0
]
Actual results:
-- dictionary:
[
"b",
""
]
-- indices:
[
0,
0,
1
]
I don't know a workaround. Converting to pylist and back is too slow. Is there a way to copy the slice to a new offset-0 StringArray that I could then dictionary-encode? Otherwise, I'm considering building buffers by hand....
Attachments
Issue Links
- links to