Details
Description
An extra 0 appears in the beginning when serializing and deserializing an array with more than 128 values and at least one NULL value using Feather. Once the extra 0 is inserted a value is trimmed at the end.
Here is the C++ code to write such an array:
#include <iostream> #include <arrow/api.h> #include <arrow/io/file.h> #include <arrow/ipc/feather.h> #include <arrow/pretty_print.h> int main() { // 1. Build Array arrow::DoubleBuilder builder; for (int i = 0; i < 129; i++) if (i == 0) builder.AppendNull(); else builder.Append(i); std::shared_ptr<arrow::Array> array; builder.Finish(&array); arrow::PrettyPrint(*array, 0, &std::cout); std::cout << std::endl; // 2. Write to Feather file std::shared_ptr<arrow::io::FileOutputStream> stream; arrow::io::FileOutputStream::Open("out.f", false, &stream); std::unique_ptr<arrow::ipc::feather::TableWriter> writer; arrow::ipc::feather::TableWriter::Open(stream, &writer); writer->SetNumRows(129); writer->Append("id", *array); writer->Finalize(); stream->Close(); return 0; }
The output of running this code is:
# g++-4.9 -std=c++11 example.cpp -larrow && ./a.out
[null, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
The array is deserialized in Python and looks like this:
>>> pandas.read_feather('out.f')
id
0 NaN
1 0.0
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 6.0
8 7.0
9 8.0
10 9.0
11 10.0
12 11.0
13 12.0
14 13.0
15 14.0
16 15.0
17 16.0
18 17.0
19 18.0
20 19.0
21 20.0
22 21.0
23 22.0
24 23.0
25 24.0
26 25.0
27 26.0
28 27.0
29 28.0
.. ...
99 98.0
100 99.0
101 100.0
102 101.0
103 102.0
104 103.0
105 104.0
106 105.0
107 106.0
108 107.0
109 108.0
110 109.0
111 110.0
112 111.0
113 112.0
114 113.0
115 114.0
116 115.0
117 116.0
118 117.0
119 118.0
120 119.0
121 120.0
122 121.0
123 122.0
124 123.0
125 124.0
126 125.0
127 126.0
128 127.0
[129 rows x 1 columns]
Notice the 0.0 value on index 1. The value should have been 1.0. Also, the last value is 127.0 instead of 128.0.
Attachments
Issue Links
- links to